explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.9k stars 4.39k forks source link

Missing space at end of strings in NUM_WORDS #759

Closed Derek-Jones closed 7 years ago

Derek-Jones commented 7 years ago

The following code in spacy/orth.pyx

NUM_WORDS = set('zero one two three four five six seven eight nine ten' 'eleven twelve thirteen fourteen fifteen sixteen seventeen' 'eighteen nineteen twenty thirty forty fifty sixty seventy' 'eighty ninety hundred thousand million billion trillion' 'quadrillion gajillion bazillion'.split())

is missing a space character after ten, seventeen, seventy, trillion.

At the moment ten is not recognised as a number, but teneleven is treated as like_number.

ines commented 7 years ago

Thanks – will be pushing the fix and regression test in a second! Also, now that I see it, this data should probably be moved to the English language data at some point in the future.

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.