aymara / lima

The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
http://aymara.github.io/lima/
Other
107 stars 21 forks source link

Normalisation of real numbers does not work #57

Open kleag opened 7 years ago

kleag commented 7 years ago

Note: this issue completes issue #50 that was covering several problems including this one.

When analysing "123 45.6 . 12 345.6", we should get three number entities with the correct numeric values:

But we get (simplified):

  <type>Numex.NUMBER</type>
  <string>123 45.6</string>
  <numvalue>0</numvalue>

  <type>Numex.NUMBER</type>
  <string>12 345.6</string>
  <numvalue>0</numvalue>

The changes on branch https://github.com/aymara/lima/tree/AutomatonTransitionOnDouble try to handle the two problems of correctly recognizing the entities and correctly normalizing them. But for an unknown reason, the changes do not work as expected.

benlabbe commented 3 years ago

Recent updates (2020-2021) for ENG and FRE in their respective Numex modex could have solved this issue. New test are needed.