tf-idf score across corpus of original tweets but vetctorizer does not import

aldmarj commented 6 years ago

I am new to spacy and textacy so I have been following an example. Although this example is slightly dated I was sure I could find the correct imports in the docs but no luck.

To calculate the tf-idf score of all the tokens in the tweets, I believe I can use fit_transform() but I can't reach the textacy vectorizer class.

I have tried multiple import statements found in the docs and the ones found in the example I am following.

Link to the example if interested: https://github.com/GabrielTseng/LearningDataScience/blob/master/Natural_Language_Processing/TwitterDisasters/spaCy/2%20-%20Tweet%20Summarization.ipynb

Current Behavior

Errors Recieved: from textacy.vsm import Vectorizer ImportError: cannot import name 'Vectorizer'

import textacy vectorizer = textacy.Vectorizer(weighting='tfidf') AttributeError: module 'textacy' has no attribute 'Vectorizer'

import textacy vectorizer = Vectorizer(weighting='tfidf') NameError: name 'Vectorizer' is not defined

Your Environment

operating system: windows 10 64bit
python version: Python 3.6.4 :: Anaconda, Inc.
spacy version: 1.9.0-np111py36_vc14_1
installed spacy models: en_core_web_sm
textacy version: 0.3.4-py36_0

bdewilde commented 6 years ago

Hi @aldmarj , it looks like you're using old versions of spacy and textacy, so I would encourage you to upgrade. :) But if that's not possible...

After you import textacy, can you do dir(textacy) and submit what comes up? Here's an example for my setup (using newer versions of the packages):

In [1]: import textacy

In [2]: dir(textacy)
Out[2]:
['Corpus',
 'Doc',
 'TextStats',
 'TopicModel',
 'Vectorizer',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'about',
 'absolute_import',
 'cache',
 'compat',
 'constants',
 'corpus',
 'data_dir',
 'doc',
 'extract',
 'io',
 'load_spacy',
 'logger',
 'logging',
 'network',
 'os',
 'preprocess',
 'preprocess_text',
 'spacy_utils',
 'text_stats',
 'text_utils',
 'tm',
 'utils',
 'viz',
 'vsm']

In [3]: textacy.vsm.Vectorizer
Out[3]: textacy.vsm.vectorizers.Vectorizer

In [4]: textacy.Vectorizer
Out[4]: textacy.vsm.vectorizers.Vectorizer

aldmarj commented 6 years ago

Thanks for the quick response @bdewilde. I'm using conda forge to download both spacy and textacy. When I install textacy I have to lower my version of spacy to v1.9.0.

Here's my output from dir(textacy): ['Corpus', 'Doc', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__resources_dir__', '__spec__', '__version__', 'absolute_import', 'compat', 'constants', 'corpora', 'corpus', 'data', 'doc', 'export', 'extract', 'fileio', 'keyterms', 'lexicon_methods', 'load_spacy', 'logger', 'logging', 'math_utils', 'network', 'os', 'pkgutil', 'preprocess', 'preprocess_text', 'similarity', 'spacy_pipelines', 'spacy_utils', 'text_stats', 'text_utils', 'tm', 'viz', 'vsm']

aldmarj commented 6 years ago

I have installed textacy through the PyPi project file instead of using conda. This has given me version 0.6.1 rather than version 0.3.4. So, I can use the vectorizer now.

Here's my output from dir(textacy): ['Corpus', 'Doc', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__resources_dir__', '__spec__', '__version__', 'absolute_import', 'compat', 'constants', 'corpora', 'corpus', 'data', 'doc', 'export', 'extract', 'fileio', 'keyterms', 'lexicon_methods', 'load_spacy', 'logger', 'logging', 'math_utils', 'network', 'os', 'pkgutil', 'preprocess', 'preprocess_text', 'similarity', 'spacy_pipelines', 'spacy_utils', 'text_stats', 'text_utils', 'tm', 'viz', 'vsm']

Thanks for pointing me in the right direction, really appreciate it.

chartbeat-labs / textacy

tf-idf score across corpus of original tweets but vetctorizer does not import #192

Current Behavior

Your Environment