Open mxdev88 opened 4 years ago
That's on purpose. Converting digit strings to integers is already supported by the int
function of python, so text2num
specifically converts spelled numbers to integers.
As for the mix of styles, what use cases are you thinking about?
Hi, In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:
"Sismo na China provocou 20 mil mortos e 26 mil feridos" https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370
"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos"" https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html
Text2num renders these as "20 1000", "26 1000", "4 000"
Would it be possible to improve on this?
Hi, In (at least Portuguese language) news articles, it is relatively common to find mixed style such as:
"Sismo na China provocou 20 mil mortos e 26 mil feridos" https://www.rtp.pt/noticias/mundo/sismo-na-china-provocou-20-mil-mortos-e-26-mil-feridos_v185370
"Violentas explosões abalam Beirute. 100 mortos, 4 mil feridos e "muitos desaparecidos"" https://www.dn.pt/mundo/violenta-explosao-em-beirute-12495480.html
Text2num renders these as "20 1000", "26 1000", "4 000"
Would it be possible to improve on this?
Hi luismavs, Have you found a simple solution for the mixed detection ("20 mil mortos" for example) ?
Hi,
I did not check it again.
IMO, the simplest solution would be add a post-processing step to re-cast anomalous texts such as 4 1000 DD
Currently, partially written numbers throw ValueError. It would be an interesting addition to handle such cases.
Expected result: 10000000
Similarly, numbers represented as text also throw ValueError instead of being converted to int.
Expected result: 10