JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

import issue: pandas 'SettingWithCopyWarning' #134

Closed emmatalis closed 4 months ago

emmatalis commented 4 months ago

Steps to Reproduce

After pip installing scattertext (have to use 0.1.4 since I'm behind a company firewall), I get an error importing the package.

import scattertext as st

Error

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[3], line 1
----> 1 import scattertext as st

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/__init__.py:20
     18 from scattertext.termscoring.BetaPosterior import BetaPosterior
     19 from scattertext.semioticsquare.SemioticSquareFromAxes import SemioticSquareFromAxes
---> 20 from scattertext.categoryprojector.OptimalProjection import get_optimal_category_projection, \
     21     get_optimal_category_projection_by_rank
     22 from scattertext.categoryprojector.CategoryProjector import CategoryProjector, Doc2VecCategoryProjector, \
     23     LengthNormalizer
     24 from scattertext.termscoring.CorpusBasedTermScorer import CorpusBasedTermScorer

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/categoryprojector/OptimalProjection.py:6
      4 from scattertext.Scalers import stretch_0_to_1
      5 from scattertext.termscoring.RankDifference import RankDifference
----> 6 from scattertext.categoryprojector.CategoryProjector import CategoryProjector
      7 from scattertext.termcompaction.AssociationCompactor \
      8     import AssociationCompactor, AssociationCompactorByRank, TermCategoryRanker
     11 def morista_index(points):
     12     # Morisita Index of Dispersion

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/categoryprojector/CategoryProjector.py:8
      5 from sklearn.base import BaseEstimator, TransformerMixin
      6 from sklearn.preprocessing import RobustScaler, StandardScaler
----> 8 from scattertext.representations.Doc2VecBuilder import Doc2VecBuilder
      9 from scattertext.termscoring.RankDifference import RankDifference
     10 from scattertext.categoryprojector.CategoryProjection import CategoryProjection, CategoryProjectionWithDoc2Vec

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/representations/__init__.py:1
----> 1 from .Word2VecFromParsedCorpus import Word2VecFromParsedCorpus, Word2VecFromParsedCorpusBigrams, CorpusAdapterForGensim
      2 from .CategoryEmbeddings import CategoryEmbeddingsResolver, EmbeddingAligner

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/representations/Word2VecFromParsedCorpus.py:5
      2 import warnings
      3 from collections import Counter
----> 5 from scattertext.ParsedCorpus import ParsedCorpus
      8 class FeatsFromGensim(object):
      9     def __init__(self, phrases, gram_size):

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/ParsedCorpus.py:5
      1 import sys
      3 import pandas as pd
----> 5 from scattertext.DataFrameCorpus import DataFrameCorpus
      6 from scattertext.indexstore.IndexStore import IndexStore
      9 class ParsedDataFrameCorpus(DataFrameCorpus):

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/DataFrameCorpus.py:3
      1 from scattertext.indexstore import IndexStore
----> 3 from scattertext.Corpus import Corpus
      6 class DataFrameCorpus(Corpus):
      7     def __init__(self,
      8                  X,
      9                  mX,
   (...)
     15                  df,
     16                  unigram_frequency_path=None):

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/Corpus.py:5
      2 import pandas as pd
      3 from numpy import nonzero
----> 5 from scattertext.TermDocMatrix import TermDocMatrix
      8 class Corpus(TermDocMatrix):
      9       def __init__(self,
     10                    X,
     11                    mX,
   (...)
     16                    raw_texts,
     17                    unigram_frequency_path=None):

File ~/anaconda3/envs/venv_scattertext/lib/python3.11/site-packages/scattertext/TermDocMatrix.py:7
      5 import pandas as pd
      6 import scipy
----> 7 from pandas.core.common import SettingWithCopyWarning
      8 from scipy.sparse import csr_matrix
      9 from scipy.stats import hmean, fisher_exact, rankdata, norm

ImportError: cannot import name 'SettingWithCopyWarning' from 'pandas.core.common'

Environment

JasonKessler commented 4 months ago

Sorry this is happening. If can try cloning the newest version from github, manually uploading to SageMaker, and installing or importing it there, you may be able to fix this issue.

Otherwise, I'd just remove line 7 from TermDocMatrix.py, and any other references to TermDocMatrix.

JasonKessler commented 4 months ago

Closing the issue due to inactivity and it not being a problem with the most recent version.