def-gthill / lexurgy

A high-powered sound change applier
GNU General Public License v3.0
42 stars 5 forks source link

Bug report #53

Closed Apocalypsam closed 1 year ago

Apocalypsam commented 1 year ago

Hi,

I think I've found a bug. Maybe it's a user problem, but I don't think so.

The word in the input is: çe

In the sound changes, those letters are marked like this and are able to be in those positions:

Symbol e [-low -high +front -back +open -closed vowel]
Symbol ⁠ç [unvoiced palatal fricative con]
Syllables:
[con]? [vowel]

But the output says: The segment "c" in "(c)̧e" doesn't fit the syllable structure; no syllable pattern can start with "c"

Maybe this is the entirely wrong place to mention it, but maybe one can help me. Thanks.

def-gthill commented 1 year ago

Looks like a bug in the way Lexurgy is doing Unicode normalization.

In the symbol declaration, you've entered a LATIN SMALL LETTER C WITH CEDILLA. But in the input word you have LATIN SMALL LETTER C followed by COMBINING CEDILLA. This isn't supposed to matter because Lexurgy does Unicode normalization (which makes both of these the same), but apparently it isn't happening here.

As a workaround for now, make these consistent. Copy the from the input word to the symbol declaration, or vice versa, whichever will be easier for you to type in the future.

Apocalypsam commented 1 year ago

Thank you for your prompt answer. The workaround sadly does not work. Will you be able to fix the bug in a future version?

def-gthill commented 1 year ago

So I think the workaround didn't work because there's an extra, invisible character (Unicode WORD JOINER) in your symbol declaration. Making this character visible as [WJ], this line actually says:

Symbol [WJ]ç [unvoiced palatal fricative con]

Since Unicode doesn't consider WORD JOINER to be whitespace, it counts as part of the symbol, so you end up declaring [WJ]ç as a symbol, and that doesn't match plain ç.

Remove this character: put the cursor after the ç, press the left arrow key, and then press Backspace. Then do the workaround as I said - either copy the input symbol into the symbol declaration, or use the pre-composed ç character in your input words. I tested both of these after deleting the WORD JOINER and they both work.

(I know it's frustrating to have invisible characters messing up your sound changes. I'm not sure how best to handle this without breaking other behaviour; I may look into making such characters visible in the editor so you can easily spot and delete them.)

As for the normalization bug itself, I can reproduce it in my tests now so I should have a fix up in the next version.

def-gthill commented 1 year ago

I have a fix for this in commit 2076652f2787d82ced205e059a8c92fc59588890. It will be included in the 1.1 release.