apertium / lttoolbox

Finite state compiler, processor and helper tools used by apertium
http://wiki.apertium.org/wiki/Lttoolbox
GNU General Public License v2.0
18 stars 22 forks source link

Double quotes gets a strange analysis #66

Closed AMR-KELEG closed 2 years ago

AMR-KELEG commented 5 years ago

A double quotes token get a simple " analysis.

$ echo '"' | lt-proc eng.automorf.bin 
"

Expected output:

^"/"<dquotes>$

Would it be better if we add the double quotes to the .dix files?

Other missing characters:

ftyers commented 5 years ago

For your project it is important that you try and avoid dealing with issues like this. You should just skip things that appear "strange". For the purposes of weighting automata, punctuation is not interesting. Although I realise it is frustrating to have to make your code deal with these issues.

I would recommend doing a sanity check like:

The line has at most one of: ^ / $ and at least one of < and >, this should mean you only get valid form:disambiguated analysis pairs.

mr-martian commented 2 years ago

This is actually just an issue with the eng transducer - see https://github.com/apertium/apertium-eng/pull/39