MLBazaar / MLPrimitives

Primitives for machine learning and data science.
https://mlbazaar.github.io/MLPrimitives
MIT License
70 stars 38 forks source link

mlprimitives.custom.text.TextCleaner fails if text is empty #228

Closed csala closed 4 years ago

csala commented 4 years ago

When the collection of texts to clean contains an empty string "", the mlprimitives.custom.text.TextCleaner._remove_stopwords crashes.

In [1]: from mlprimitives.custom.text import TextCleaner                                                                                                                                                                                                                       

In [2]: cleaner = TextCleaner()                                                                                                                                                                                                                                                

In [3]: cleaner.produce(['not empty', ''])                                                                                                                                                                                                                                     
---------------------------------------------------------------------------
LangDetectException                       Traceback (most recent call last)
<ipython-input-3-342ec016e729> in <module>
----> 1 cleaner.produce(['not empty', ''])
...
~/.virtualenvs/MLPrimitives/lib/python3.6/site-packages/langdetect/detector.py in _detect_block(self)
    148         ngrams = self._extract_ngrams()
    149         if not ngrams:
--> 150             raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
    151 
    152         self.langprob = [0.0] * len(self.langlist)

LangDetectException: No features in text.