bmabey / pyLDAvis

Python library for interactive topic model visualization. Port of the R LDAvis package.
BSD 3-Clause "New" or "Revised" License
1.8k stars 361 forks source link

pandas FutureWarning in pyLDAvis.sklearn.prepare #132

Closed Demetrio92 closed 6 years ago

Demetrio92 commented 6 years ago

I was replicating this article on my dataset, and found a pandas FutureWarning in pyLDAvis.sklearn.prepare

panel = pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')

gives me:

project_root/venv/lib/python3.5/site-packages/pyLDAvis/_prepare.py:257: 
FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
  return pd.concat([default_term_info] + list(topic_dfs))

So, I thought, there should be an issue in this repository :)


>>> pyLDAvis.__version__
 '2.1.2'

>>> pd.__version__
 '0.23.1'

Code to reproduce:

from nltk.corpus import brown
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer
import pyLDAvis.sklearn

# params
NUM_TOPICS = 10

# get data
data = []
for fileid in brown.fileids():
    document = ' '.join(brown.words(fileid))
    data.append(document)

# Transform the collection of texts to a numerical form
vectorizer = CountVectorizer(min_df=5, max_df=0.9,
                             stop_words='english', lowercase=True,
                             token_pattern='[a-zA-Z\-][a-zA-Z\-]{2,}')
data_vectorized = vectorizer.fit_transform(data)

# Build a Latent Dirichlet Allocation Model
lda_model = LatentDirichletAllocation(n_components=NUM_TOPICS, max_iter=10, learning_method='online')
lda_Z = lda_model.fit_transform(data_vectorized)

# Visualize
panel = pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')
hyteh commented 6 years ago

I have the same warning with similar versions of pyLDAvis and pandas, and I was replicating this article as well.

>>> pyLDAvis.__version__
'2.1.2'
>>> pd.__version__
'0.23.3'

Can't seem to get it to work, is there a fix for this?

Edit: works fine in Jupyter Notebooks though.

bmabey commented 6 years ago

Thanks for the report. Anyone feel like submitting an easy PR? :)

Demetrio92 commented 6 years ago

@bmabey done. I was thinking, maybe the issue should be addressed more generally -- that would actually require me to understand what's exactly wrong with the inputs from pandas perspective, and why they don't think sorting by default is a good idea. But hey, If it ain't broke, don't fix it.

elenawij commented 6 years ago

Hi! Has anyone solve it? I still get the warning... Thank you!

Demetrio92 commented 6 years ago

@elenazadnepr it's probably not in the latest pypi release yet. You can see that my PR is merged.

for now you can install latest version from the master branch if that warning really bothers you that much

mrciolino commented 5 years ago

Same error, not sure if the Sep 4th, 2018 was put into conda-forge

anaconda3/envs/DS/lib/python3.6/site-packages/pyLDAvis/_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. return pd.concat([default_term_info] + list(topic_dfs))

pandas 0.24.2 py36h0a44026_0 anaconda pyldavis 2.1.2 py_0 conda-forge

Atom : 1.39.1 Electron: 3.1.10 Chrome : 66.0.3359.181 Node : 10.2.0

maxn0d3x commented 4 years ago

Anaconda/anaconda3/envs/data/lib/python3.7/site-packages/pyLDAvis/_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

return pd.concat([default_term_info] + list(topic_dfs))

JiaxiangBU commented 4 years ago

I get the same warning.

pyLDAvis.__version__
'2.1.2'
pd.__version__
'0.24.2'

Here is the reproducible example github.com/JiaxiangBU/wei_lda_debate/blob/master/lda-analysis.ipynb