Closed ljha-CS closed 8 years ago
I modified spacy_utils.py line 84 with return (token.text if preservecase(token) else token.lemma).encode('ascii', 'ignore') and its working.
@pklalitjha Well, shoot. I've been developing textacy
entirely in Python 3.5, but trying to maintain Python 2.7 compatibility. It seems I've failed. str
is unicode
in Python 3, but not 2. :weary: Your hack is a fine band-aid for now, but I'll prioritize ensuring Py2/3 compatibility everywhere, not to mention adding tests for the keyterms module and running all tests in both Python 2 and 3.
Sorry about the trouble! You're my first GitHub issue on textacy
— finally feels like an open-source project.
@pklalitjha I've pushed some changes to master that should fix your Py2 str issue. There may be other Py2/3 compatibility issues I've not found -- please holler if you run into anything!
I am trying to get key terms from semantic network, following this guide: https://media.readthedocs.org/pdf/textacy/latest/textacy.pdf#page=28
But getting this error "Input terms must be strings or spacy Tokens, not <type 'unicode'>." with following algorithms: _textacy.keyterms.key_terms_from_semantic_network_ Ranking Algo, Edge Weightage bestcoverage cooc_freq bestcoverage binary pagerank cooc_freq pagerank binary divrank cooc_freq divrank binary
_doc.key_terms_ failed with same error for 'textrank', 'singlerank' algos, but sgrank worked.
In keyterms.py spacy_utils.normalized_str(word) results into unicodes.
Error trace: Traceback (most recent call last): File "/home/ljha/PycharmProjects/nlp/Analyze.py", line 105, in
print 'Key Terms ' + algo + ' ' + ew, textacy.keyterms.key_terms_from_semantic_network(doc, window_width=3, edge_weighting=ew, ranking_algo=algo, join_key_words=False, n_keyterms=10)
File "/usr/local/lib/python2.7/dist-packages/textacy/keyterms.py", line 248, in key_terms_from_semantic_network
good_word_list, window_width=window_width, edge_weighting=edge_weighting)
File "/usr/local/lib/python2.7/dist-packages/textacy/transform.py", line 64, in terms_to_semantic_network
raise TypeError(msg)
TypeError: Input terms must be strings or spacy Tokens, not <type 'unicode'>.