Closed xrotwang closed 6 years ago
It is important, with the lingpy tutorial in mind, to guarantee backwards compatibility (or to change the tutorial accordingly, if we know when the spec changes). Just to mention this in this context, maybe you already thought about it.
@LinguList I didn't think about this explicitly, so thanks for the heads-up. But the idea is to make the current status the fallback in case there's no metadata describing the profile. AFAIK the lingpy tutorial doesn't use a rules file, right? because this may be one thing where I'd like to introduce a backwards-incompatible change.
Yes! no rules file. I don't like them, as they are too idiosyncratic. Our policy should be: instead of making lazy rules that are difficult to handle (we barely see the full power of a regex), spell the things out: ab -> aa should be written as ab -> a, a -> a, abb -> a, etc. (and we'll barely have == n times, but can say: abb -> a, abbb -> a, and finito).
The spec for orthography profiles will be changed to incorporate metadata via CSVW. The
segments
package should support this enhancement, and also use the metadata file to link and describe additional files, namely rules and replacements (a set of replacements - possibly specified as regular expressions - to be run before tokenization).