I meet a problem when I try to follow your command.
python prep-text.py data/sample/month1 data/sample/month2 data/sample/month3 -o data --tfidf --norm
/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/init.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=FutureWarning)
Loaded 347 stopwords
Processing 'month1' from data/sample/month1 ...
Found 438 documents to parse
Pre-processing documents (347 stopwords, tfidf=True, normalize=True, min_df=10, max_ngram=1) ...
Traceback (most recent call last):
File "prep-text.py", line 91, in
main()
File "prep-text.py", line 81, in main
apply_norm = options.apply_norm, ngram_range = (1,options.max_ngram), lemmatizer=lemmatizer )
File "/home/jjc/桌面/dynamic-nmf-master/text/util.py", line 40, in preprocess
X = tfidf.fit_transform(docs)
File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1859, in fit_transform
X = super().fit_transform(raw_documents)
File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1220, in fit_transform
self.fixedvocabulary)
File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 1131, in _count_vocab
for feature in analyze(doc):
File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 108, in _analyze
doc = ngrams(doc, stop_words)
File "/home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/feature_extraction/text.py", line 227, in _word_ngrams
tokens = [w for w in tokens if w not in stop_words]
TypeError: 'NoneType' object is not iterable
I don't know why... could you please help me to solve the problem?
I meet a problem when I try to follow your command.
python prep-text.py data/sample/month1 data/sample/month2 data/sample/month3 -o data --tfidf --norm /home/jjc/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/init.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+. warnings.warn(msg, category=FutureWarning) Loaded 347 stopwords
I don't know why... could you please help me to solve the problem?