Open valdisvi opened 7 years ago
Using .replace
to expand to multiple letters is working for me (i.e. the replace rules in en_rules
). Are there specific cases that are not working?
Yes, because in compiledict.c bytes are compressed into integer with utf8_in function, and then only these 4 bytes are written with Write4Bytes. That produces wrong result, if there are too many "meaningful" bytes in from or to part of replacement.
So, universal .replace
implementation requires to replace arbitrary number of from bytes to arbitrary number of to bytes.
To test it, just add rule e.g.
.replace
æ are
are usi
ša ra
//etc. with even more bytes in from or to part of replacement
Language analysis and spelling decisions could be improved by introducing following new features:
verb follows/noun follows
marks to more/arbitrary flags, which then can be used to make different pronunciation rules for homonymsJ
statement as precondition, to allow choosing pronunciation from preceding word.J
statement should support letter groups e.g.(JL01
as marking letters. This could help solving names of numbers as different words #83.replace
rule with extended trace.replace
rule after looking in ..._list filesreplace
rule extended to replace not only characters, but group of characters,replace
using matching rules_list
extended to mark arbitrary defined word types (e.g.$units
#115) and by comparing only root part of the word (i.e. partial match without pre/suffixes). See issue #263 for details..._list
file. See for example workaround for German ja