Open mroughan opened 3 years ago
Although with your example t2d.convert("“One hundred "), it only seems to be down to the starting quote, since this works as expected:
t2d.convert("One hundred")
gives
'100'
Guessing a solution would need to split tokens slightly differently so the number-words are not considered part of the connected token (eg a ' " ' or '.' or '-')
I also have this. If a sentence finishes with a word number and a dot, the number is not converted. eg
"What is two plus seven ? The answer is nine." -> "What is 2 plus 7? the answer is nine."
@ShailChoksi
The bug is in the lexer
The error is likely in split_glues, as the dot in "nine." is identified as part of the word.
So, we find separators. Separators are either whitespace or punctuation with things that are not numbers on both sides.
The problem is that after the last separator has been found, the function returns the last part of the sequence, even if it has punctuation inside of it. The way this happens is if there is punctuation at the end of the final word, as the pattern only detects punctuation if there's non-word characters on both sides.
I added code to extract the non-punctuation characters in the final word. This seems to correct the problem.
Added a test to make sure that there is still text to work on, otherwise it would crash.
Passes all tests. Pull request at https://github.com/ShailChoksi/text2digits/pull/48
There are problems when a number word appears at the start or end of a sentence:
or t2d.convert("“One hundred ") '“One 100 '