Open AMR-KELEG opened 5 years ago
I have found that some tags are marked as unknown * despite getting analysed by the compiled dictionary.
*
Theses cases can be discovered easily but I need help in manually inspecting them.
The tagging doesn't seem to be that easy as for example: The token bloody is located in lines 11 and 11145 https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L11 https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L11145
bloody
11
11145
^bloody/*bloody
^bloody/bloody<adj><sint>$
What do you think is the better way to fix such cases?
For the weighted automata project, the best way is to just ignore these errors. Your code should just discard/skip invalidly encoded words.
I have found that some tags are marked as unknown
*
despite getting analysed by the compiled dictionary.Theses cases can be discovered easily but I need help in manually inspecting them.
The tagging doesn't seem to be that easy as for example: The token
bloody
is located in lines11
and11145
https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L11 https://github.com/apertium/apertium-eng/blob/master/texts/eng.tagged#L11145^bloody/*bloody
^bloody/bloody<adj><sint>$
What do you think is the better way to fix such cases?