def-gthill / lexurgy

A high-powered sound change applier
GNU General Public License v3.0
42 stars 5 forks source link

Duplicate romanizer suppresses error message #54

Closed AdamantConlanger closed 1 year ago

AdamantConlanger commented 1 year ago

Minimal example:

Feature type(consonant, vowel)

#romanizer literal:
  #[vowel] => [consonant]

romanizer literal:
  [vowel] => [consonant]

It doesn't work if the comment marks are present. But remove them, and it works fine.

def-gthill commented 1 year ago

The "not working" is the correct behaviour here. You're using a literal romanizer, which means "ignore the declarations", i.e. your features like vowel don't exist in the romanizer. It's designed for corner cases where you use the same symbol for one purpose in phonetic notation, and a different and incompatible purpose in the romanization. Most of the time, you don't want your romanizers to be literal! Remove literal here, and you're fine, with or without the comment.

There's still a bug here: if you remove the comment marks, you lose the error message. The romanizer is invalid, and having two copies of it certainly doesn't make it any more valid. So I'll look into why the error message goes away when you have a duplicate romanizer.

AdamantConlanger commented 1 year ago

Oh. In that case, I believe I do have another bug where the literal romanizer sometimes (i.e. in some files) doesn't allow isolated combining diacritic marks, while at other times it does. Also, is there a different way to have such double usage of diacritic marks while still allowing for non-diacritic features? Also, the description for literal (de)romanization in the cheat sheet isn't very clear; I still don't really know what's meant with it. Finally, do Classes still work in literal romanization? Because that would (in an annoying way) possibly solve a part of the problem I'm having.

AdamantConlanger commented 1 year ago

Oh. In that case, I believe I do have another bug where the literal romanizer sometimes (i.e. in some files) doesn't allow isolated combining diacritic marks, while at other times it does.

Never mind; in one file, I had tone diacritics declared, but not in the other. Such tone marks could be used in literal romanizers in the second file, but not in the first. In other words: diacritics work as expected.

def-gthill commented 1 year ago

Literal (de)romanizers ignore all declarations, including classes. But if you have Then: blocks in the rule, only the first of the blocks in a literal deromanizer ignores declarations, and only the last of the blocks in a literal romanizer ignores declarations. They're meant to be used like this:

deromanizer literal:
    <turn the characters that conflict with the declarations into something else>
    Then:
    <normal deromanizer rules; you can reference your declarations here!>

<normal sound changes>

romanizer literal:
    <normal romanizer rules; you can reference your declarations here!>
    Then:
    <produce any characters that conflict with the declarations>

The literal parts are only there to deal with the conflict between the romanization and the declarations. They should be small and simple, so the lack of declarations shouldn't be a big deal. The real work should happen in the non-literal part of the (de)romanizer!

The reason this feature is so confusing and so poorly documented is that it's a mostly forgotten relic of the very early stages of Lexurgy. I've kept it around because it's necessary once in a while, but I'd certainly design it differently if I were just introducing it now.

Hope that clears things up a bit!

def-gthill commented 1 year ago

Fixed in commit 440a77e5549fa3ad71319e89f2896acfff4989fb. I'll collect up the fixes to all your recently reported bugs in release 1.1.2.

The reason for this bug was embarrassingly stupid. The logic basically said "If there's one romanizer, give it to me. Otherwise... well, I guess there aren't any romanizers!" For some reason I couldn't imagine the case of more than one romanizer. So if there were multiple romanizers, they'd just all be ignored completely with no error message or anything.

AdamantConlanger commented 1 year ago

Literal (de)romanizers ignore all declarations, including classes. But if you have Then: blocks in the rule, only the first of the blocks in a literal deromanizer ignores declarations, and only the last of the blocks in a literal romanizer ignores declarations.

Oh, that explains it! I guess I could've surmised it from the cheat sheet. For some reason I thought it did something akin to converting all (phonological/transcriptual) segments into (computer science) objects, forgetting all orthographic/transcriptual information and in the process allowing any symbol to be re-used as a literal. Maybe something to include in a later version?

In any case, the cheat sheet should be updated. I read "Ignore all declarations until the first Then:" as "All symbol and diacritic declarations made up until the first Then: in the file are ignored when using the characters in those declarations," as if every literal you type were "escaped" in some sense. I did find it weird that Then: would be the dropoff line, but I assumed it was just a quirk of programming or something. It would mean that characters gets parsed as in a regular text replacement program, while non-character segments still work (silly of me to forget that elements and classes can't be differentiated from their literal identifiers; there's no special character like with square brackets in features.) I believe it wasn't unreasonable of me to interpret it that way, which is a problem. Therefore, the documentation should be updated.