chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

ValueError: `terms` = [] is invalid; it must contain at least 1 term in the form of a string or spacy token #211

Closed mehdipiraee closed 5 years ago

mehdipiraee commented 6 years ago

textacy.keyterms.key_terms_from_semantic_network throws an error when it does not find any words (I think!)

Steps to Reproduce (for bugs)

doc = textacy.Doc('just as advertised', lang='en')
textacy.keyterms.key_terms_from_semantic_network(doc)

Traceback

ValueError                                Traceback (most recent call last)
<ipython-input-500-00548ea99d94> in <module>()
      1 doc = textacy.Doc('just as advertised', lang='en')
----> 2 textacy.keyterms.key_terms_from_semantic_network(doc)

~/anaconda3/lib/python3.6/site-packages/textacy/keyterms.py in key_terms_from_semantic_network(doc, normalize, window_width, edge_weighting, ranking_algo, join_key_words, n_keyterms, **kwargs)
    298     good_word_list = [word for word in good_word_list if word]
    299     graph = network.terms_to_semantic_network(
--> 300         good_word_list, window_width=window_width, edge_weighting=edge_weighting)
    301 
    302     # rank nodes by algorithm, and sort in descending order

~/anaconda3/lib/python3.6/site-packages/textacy/network.py in terms_to_semantic_network(terms, normalize, window_width, edge_weighting)
     70         raise ValueError(
     71             '`terms` = {} is invalid; it must contain at least 1 term '
---> 72             'in the form of a string or spacy token'.format(terms))
     73 
     74     # if len(terms) < window_width, cytoolz throws a StopIteration error

ValueError: `terms` = [] is invalid; it must contain at least 1 term in the form of a string or spacy token

Your Environment

bdewilde commented 6 years ago

Hi @mehdipiraee , I think I'm confused: Is this not expected behavior?

mehdipiraee commented 6 years ago

@bdewilde, I'm a newbie, so I wasn't sure if it's expected behavior or not. I was not expecting an exception if no terms are found. I handled it by capturing an exception. Isn't it better to return None?

bdewilde commented 5 years ago

Hey @mehdipiraee , you've probably long since moved on from this, but I finally got back to it and... I think you're right, raising an exception here is bad. So, I'm making a change: network.terms_to_semantic_network() returns an empty graph if no terms are passed to it, and in turn keyterms.key_terms_from_semantic_network() returns an empty list.