Open Mte90 opened 4 years ago
I see that there's a roman_numbers.py script. What's the problem? It's not accurate enough?
It isn't perfect we had various fake positive with that.
Which importers do you have more sentences with Roman numbers? About Ted Importer there is an issue refer roman number. In function _maybenormalize (ted_importer.py) parameter _romannormalization is False so function _do_romannormalization is not performed ( see utils.roman_numbers)
We removed in ted that normalization because had a lot of fake positives
We have the issue that the text corpus include roman numbers but we need to convert those as usual numbers but also to spot fake positives and so on.
We need a way to detect roman numbers and not other text that include that letters.