Closed JasonCrowe closed 7 years ago
Hi @JasonCrowe :hand:
Thank you for your report. The thing is that this is the expected behavior as it's using the default normalizations. Those are:
DEFAULT_NORMALIZATIONS = [
'remove_extra_whitespaces',
'replace_punctuation',
'replace_symbols',
'remove_stop_words'
]
But maybe this list should change. What do you think?
If remove_extra_whitespace was the last operation in default, wouldn't it fix this issue? Am I understanding that right? If this isn't applicable to the package, I am happy to change my local copy if it will fix it.
Hi @JasonCrowe,
Yeah, the result would be the one you expect. But again, not really an issue.
Having said this, I kind of agree with you so I will move remote_extra_whitespaces
down in the next version. There is not a date for it yet because is going to be a major release (it will include a CLI) but hopefully will be in less than a week.
Thanks again for you comments ;)
w = 'Car , 950' cucco.normalize(w)
The program seems to check for whitespace to remove before removing punctuation. This causes it to return 'Car__950' rather than 'Car_950'.
ETA: added underscore in place of spaces to show effect.