Closed havardl closed 3 years ago
There's not an explicit function, but running something along the lines of
corpus_no_numbers = corpus.remove_terms([t for t in corpus.get_terms() if re.match('^\d+$', t)])
should do the trick
To close the loop on this, since coming up with a good definition of a number is tricky (do ordinals count? written numbers? what about "one" as a determiner or used idiomatically etc.) adding a "remove_numbers" method is probably more trouble than it's worth, especially given the concise way any approach can be executed.
First off, thanks for a great package!
I was wondering if there is something similar to the
remove_terms()
function which would filter out all numbers? I could of course generate a x long list of numbers between 0 and n and feed that to the function, but just wanted to check if there already was specific support for this.