chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.22k stars 250 forks source link

Keyterms Error: Input terms must be strings or spacy Tokens, not <type 'unicode'>. #2

Closed ljha-CS closed 8 years ago

ljha-CS commented 8 years ago

I am trying to get key terms from semantic network, following this guide: https://media.readthedocs.org/pdf/textacy/latest/textacy.pdf#page=28

But getting this error "Input terms must be strings or spacy Tokens, not <type 'unicode'>." with following algorithms: _textacy.keyterms.key_terms_from_semantic_network_ Ranking Algo, Edge Weightage bestcoverage cooc_freq bestcoverage binary pagerank cooc_freq pagerank binary divrank cooc_freq divrank binary

_doc.key_terms_ failed with same error for 'textrank', 'singlerank' algos, but sgrank worked.

In keyterms.py spacy_utils.normalized_str(word) results into unicodes.

Error trace: Traceback (most recent call last): File "/home/ljha/PycharmProjects/nlp/Analyze.py", line 105, in print 'Key Terms ' + algo + ' ' + ew, textacy.keyterms.key_terms_from_semantic_network(doc, window_width=3, edge_weighting=ew, ranking_algo=algo, join_key_words=False, n_keyterms=10) File "/usr/local/lib/python2.7/dist-packages/textacy/keyterms.py", line 248, in key_terms_from_semantic_network good_word_list, window_width=window_width, edge_weighting=edge_weighting) File "/usr/local/lib/python2.7/dist-packages/textacy/transform.py", line 64, in terms_to_semantic_network raise TypeError(msg) TypeError: Input terms must be strings or spacy Tokens, not <type 'unicode'>.

ljha-CS commented 8 years ago

I modified spacy_utils.py line 84 with return (token.text if preservecase(token) else token.lemma).encode('ascii', 'ignore') and its working.

bdewilde commented 8 years ago

@pklalitjha Well, shoot. I've been developing textacy entirely in Python 3.5, but trying to maintain Python 2.7 compatibility. It seems I've failed. str is unicode in Python 3, but not 2. :weary: Your hack is a fine band-aid for now, but I'll prioritize ensuring Py2/3 compatibility everywhere, not to mention adding tests for the keyterms module and running all tests in both Python 2 and 3.

Sorry about the trouble! You're my first GitHub issue on textacy — finally feels like an open-source project.

bdewilde commented 8 years ago

@pklalitjha I've pushed some changes to master that should fix your Py2 str issue. There may be other Py2/3 compatibility issues I've not found -- please holler if you run into anything!