chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

'spacy.tokens.doc.Doc' object has no attribute 'to_bag_of_terms' #310

Closed Wapiti08 closed 3 years ago

Wapiti08 commented 3 years ago

steps to reproduce

  1. do this
  2. then this
  3. finally, this

expected vs. actual behavior

possible solution?

context

environment

This is my code:

docx_textacy.to_bag_of_terms(ngrams=(1, 2, 3), named_entities=True, we

And I got this error:

'spacy.tokens.doc.Doc' object has no attribute 'to_bag_of_terms'

I have checked the document. And I have no idea what's wrong with this error.

Please give me some tips on that. Thanks

bdewilde commented 3 years ago

Hi @Wapiti08 , I can't reproduce your error because the code example you gave is incomplete. Could you provide a full example?

As for that error message: Doc extensions like to_bag_of_terms are added when you import textacy (see here). Have you done that?

Wapiti08 commented 3 years ago
sentence = "The bag of words (BoW) approach works well for multiple text classification problems. This approach assumes that presence or absence of word(s) matter more than the sequence of the words. However, there are problems such as entity recognition, part of speech identification where word sequences matter as much, if not more. Conditional Random Fields (CRF) comes to the rescue here as it uses word sequences as opposed to just words."

spacy_lang = textacy.load_spacy_lang("en_core_web_sm")
docx_textacy = spacy_lang(sentence)
bot = docx_textacy.to_bag_of_terms(ngrams=(1, 2, 3), named_entities=True, weighting='count',as_strings=True)

Then when I executed it, I got this:

'spacy.tokens.doc.Doc' object has no attribute 'to_bag_of_terms'
Wapiti08 commented 3 years ago

Sorry. I found the reason. It should be like:

spacy_lang = textacy.load_spacy_lang("en_core_web_sm")
docx_textacy = spacy_lang(sentence)
bot = textacy.spacier.doc_extensions.to_bag_of_terms(docx_textacy, ngrams=(1, 2, 3), named_entities=True, weighting='count',as_strings=True)