Closed kleag closed 7 years ago
My correction allows to better detect some entities but for a generic correction, the work started in th branch should be continued.
@romaricb could you have a look at that, please ?
The problem appears only in english. The rules for the recognition of numbers contain
@Number=($NOMBRE) @Number::(@Number|million|billion){0-n}:NUMBER:=>NormalizeNumber()
Since all numeric forms are associated with the POS $NOMBRE, this rule merges all consecutive numeric forms into one.
Changing this rule to @Number::(million|billion){0-n}:NUMBER:=>NormalizeNumber() could correct this issue, but this rule was there to handle text forms of numbers (three hundred thousand)... We may need a way of differentiating text forms and numeric forms of numbers.
corrected rules to take this problem into account (I used a list of text forms of numbers in the rules files to have more explicit rules.)
Note that the correction was in commit de70ea86736c94f9ff9a6d2b9f5035f87862d769
After named entities, we get for "1234 3.2 4,5":
while we should get three different entities.
Modex rules can be improved but not completly because we cannot have a numeric transition on real numbers, only on integers.
I tried to change the code to allow transitions on real numbers but it does not work. My try is on branch AutomatonTransitionOnDouble. I probably forgot to change something somewhere but I cannot figure out. .