JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

DataFrame.ix usage needs to be replaced with .iloc #64

Closed quillan86 closed 4 years ago

quillan86 commented 4 years ago

Your Environment

I tried to use print(list(corpus.get_scaled_f_scores_vs_background().index[:10])) after creating a corpus and got this error. Issue is that the latest Pandas version removed ix ( I have Pandas version 1.0.5)


AttributeError                            Traceback (most recent call last)
<ipython-input-293-2fb682de04c3> in <module>
----> 1 print(list(corpus.get_scaled_f_scores_vs_background().index[:10]))

~/opt/anaconda3/lib/python3.7/site-packages/scattertext/TermDocMatrix.py in get_scaled_f_scores_vs_background(self, scaler_algo, beta)
    922                 pd.DataFrame of scaled_f_score scores compared to background corpus
    923         '''
--> 924                 df = self.get_term_and_background_counts()
    925         df['Scaled f-score'] = ScaledFScore.get_scores_for_category(
    926                         df['corpus'], df['background'], scaler_algo, beta

~/opt/anaconda3/lib/python3.7/site-packages/scattertext/TermDocMatrix.py in get_term_and_background_counts(self)
    881                 term_freq_df = self.get_term_freq_df()
    882                 corpus_freq_df = pd.DataFrame({'corpus': term_freq_df.sum(axis=1)})
--> 883                 corpus_unigram_freq = self._get_corpus_unigram_freq(corpus_freq_df)
    884                 df = corpus_unigram_freq.join(background_df, how='outer').fillna(0)
    885                 del df.index.name

~/opt/anaconda3/lib/python3.7/site-packages/scattertext/TermDocMatrix.py in _get_corpus_unigram_freq(self, corpus_freq_df)
    888         def _get_corpus_unigram_freq(self, corpus_freq_df):
    889                 unigram_validator = re.compile('^[A-Za-z]+$')
--> 890         corpus_unigram_freq = corpus_freq_df.ix[[term for term
    891                                                          in corpus_freq_df.index
    892                                                  if unigram_validator.match(term) is not None]]

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'ix'```
quillan86 commented 4 years ago

Okay i was using the installation through conda, which is out of date:

https://anaconda.org/conda-forge/scattertext

lesliewalcott commented 4 years ago

@quillan86, how were you able to resolve this? I am experiencing the same problem. I installed Scattertext using pip.

JasonKessler commented 4 years ago

There's a known issue with the conda package being out of date. @lesliewalcott, could you please let me know which version of Scattertext you're running? This was a problem with earlier versions, but the current (0.0.2.67) shouldn't suffer from this.

If you're using an old version, please run

$ pip install -U scattertext

to update your installation.

lesliewalcott commented 4 years ago

Hi Jason, I appreciate the response. We are running Scattertext 0.0.2.67 and pandas 1.1.0. We tried downgrading pandas, but got the same error. Something weird is going on, because we can run our program properly on my machine, but not my colleague's, who cloned my repo and is using my frozen env. I can comment back here when we find a solution, something else must be causing the problem.

JasonKessler commented 4 years ago

That makes sense... your colleague may want to try using a fresh virtual environment.

JasonKessler commented 4 years ago

Closing this issue since it appears to be resolved.