Open gewy opened 4 years ago
Hello @gewy. Would you have some time to open a PR on the subject along with a unit test?
Hello @gewy. I just pushed a commit fixing rule V-10. I add to interpret some details of the paper to make this work because the way the algo is described is not completely sound. What do you think of the solution?
Hi, My implementation in Java : new Rule("V-10", "(?<=^|[^aeiouy])y|y(?=[^aeiouy]|$)", "I"); Test on vowels is not necessary IMHO. Having consonant on one side (or ^$) is enough to proove that we don't have vowels on both sides.
BTW I will check but I am not sure that C-27 and C-28 are corrects either.
new Rule("V-10", "(?<=^|[^aeiouy])y|y(?=[^aeiouy]|$)", "I");
Unfortunately JavaScript does not support lookbehind assertions in regex (at least not all engines, since lookbehinds were added recently to the specs).
BTW I will check but I am not sure that C-27 and C-28 are corrects either.
Fair enough. Tell me when you know and I'll make the required changes on my side.
new Rule("V-10", "(^|[^aeiouy])y|y([^aeiouy]|$)", "$1I$2"); do not work in JS ??
C-27 the document says Z with vowels BEFORE and you regex is Z(?=${V})
C-28 exclude SS between vowels, your regex check the right side only (cf. V-10)
I have simplified V-10 rule as per your suggestion. Concerning C-27, I have an interpretation question: should OZOUADE
finally be OSWADE
then (I am fine with this). But should POUYEZ
become POUYES
as per C-27 (I am less fine with this). Sorry if this is obvious but I did not read this paper since a very long time.
I have updated rule C-28.
Same feeling about rules. Anyway I was using uppercase and lowercase to easily identify the applied rules for my testing. Then I add the CASE_INSENSITIVE property to the matcher object.
Le mer. 2 sept. 2020 à 17:22, Guillaume Plique notifications@github.com a écrit :
Also, I rely on some weird adhoc rule ordering because the paper's rules were not finely thought out but you seem to rely on an uppercase/lowercase trick to do the same. Do you find it easier likewise?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Yomguithereal/talisman/issues/175#issuecomment-685808548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPWWDSFWDFU7NUU6LM723SDZPL5ANCNFSM4QGWQY3A .
So what did you choose regarding C-27? Do you get POUYES
?
Well, with all the phonetic algorithms on family names I have tested, I had counter examples. If you try to change a rule for one case you will probably trigger other weird cases.
Le mer. 2 sept. 2020 à 17:21, Guillaume Plique notifications@github.com a écrit :
I have simplified V-10 rule as per your suggestion. Concerning C-27, I have an interpretation question: should OZOUADE finally be OSWADE then (I am fine with this). But should POUYEZ become POUYES as per C-27 (I am less fine with this).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Yomguithereal/talisman/issues/175#issuecomment-685807894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPWWAHAFEQAK4YB4Y6MKDSDZPIHANCNFSM4QGWQY3A .
Yes, I try to apply the rules strictly as they are in the document (or as I understand them...) Anyway I am more disturbed by this cases : MAINARD -> MINAR MENNAR -> MENAR MEINNART -> MEINAR RAIMOND -> RINON RAYMOND -> RAIMON May be linked to the rules order. (V-18)rINond[rINon]RAIMOND -> RINON (V-10)raImond[raImon]RAYMOND -> RAIMON If I put V-10 before V-18 (V-18)rINond[rINon]RAIMOND -> RINON (V-10)raImondrINond[rINon]RAYMOND -> RINON Anyway : REIMON -> REIMON (C-28)remont[remon]REMMONT -> REMON REMON -> REMON
Anyway I still don't have validate the choice to use this algorithm.
Le jeu. 3 sept. 2020 à 10:53, Guillaume Plique notifications@github.com a écrit :
So what did you choose regarding C-27? Do you get POUYES?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Yomguithereal/talisman/issues/175#issuecomment-686351379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALPWWGVNQODSQTQ2CKTWXTSD5KRZANCNFSM4QGWQY3A .
Yes, this algorithm is not very good outside of its original goal to match names from Saguenay etc. I work on a personal algorithm for French that is way better but is geared to keep vocalization.
Hi, Rule V-10 seams to be incorrect. The paper say : "Replace Y by I except if Y is between two vowels". TYOU and YOU should give TIOU and IOU and not be inchanged. Regards