Da-Capo / Entity-Relation-SVM

SVM Entity Relation classification for ace2005 chinese data
14 stars 4 forks source link

执行token.py出错 #4

Open AaronWhite95 opened 5 years ago

AaronWhite95 commented 5 years ago

请问在token.py中执行创建词袋的步骤时,报如下的错是为什么呢?stackoverflow上的方法都行不通 Traceback (most recent call last): File "feature_extract.py", line 51, in tokens = token.get_tokens() File "/home/xfbai/Entity-Relation-SVM-master/new_token.py", line 65, in get_tokens X_train_counts = vectorizer.fit_transform(cut_docs) File "/home/xfbai/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 1031, in fit_transform self.fixedvocabulary) File "/home/xfbai/anaconda3/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 962, in _count_vocab raise ValueError("empty vocabulary; perhaps the documents only" ValueError: empty vocabulary; perhaps the documents only contain stop words

谢谢

Da-Capo commented 5 years ago

这个代码也算是上古黑历史了[捂脸],不排除版本问题。我查了下这个报错是生成了空的词袋,你可以试着打印下 cut_docs 的内容确定是不是分词出了问题,或者看下CountVectorizer()的参数是不是有问题,还不行的话再向上排查。