Closed AMR-KELEG closed 2 years ago
For your project it is important that you try and avoid dealing with issues like this. You should just skip things that appear "strange". For the purposes of weighting automata, punctuation is not interesting. Although I realise it is frustrating to have to make your code deal with these issues.
I would recommend doing a sanity check like:
The line has at most one of: ^ / $
and at least one of <
and >
, this should mean you only get valid form:disambiguated analysis
pairs.
This is actually just an issue with the eng transducer - see https://github.com/apertium/apertium-eng/pull/39
A double quotes token get a simple
"
analysis.Expected output:
^"/"<dquotes>$
Would it be better if we add the double quotes to the
.dix
files?Other missing characters:
°
°C
can be also handled instead of getting°^C/*C$
analysis.–
(unicode decimal value: 8211)