ContinuumIO / topik

A Topic Modeling toolbox
BSD 3-Clause "New" or "Revised" License
92 stars 24 forks source link

Problem running tutorial code #41

Closed AHMcKenzie closed 9 years ago

AHMcKenzie commented 9 years ago

Hello, I was trying out the topik package and ran into some problems with the basic examples in the tutorial (http://topik.readthedocs.org/en/latest/example.html). Specifically, I was trying to get an LDAvis visualization using a variation of your basic code:

from topik.run import run_model run_model("reviews", content_field="text", r_ldavis=True, dir_path="./topic_model")

The parameters don't seem to match what's on the documentation, so I'm going by trial and error. With the present code, I get the error below. Could you kindly let me know how to properly invoke lDAvis services? Thanks and best regards

Alex

----> 1 run_model("reviews", content_field="text", r_ldavis=True, dir_path="./topic_model") /Users/alexmckenzie/anaconda/lib/python2.7/site-packages/topik-0.1.0-py2.7.egg/topik/run.pyc in run_model(data_source, source_type, year_field, start_year, stop_year, content_field, clear_es_index, tokenizer, n_topics, dir_path, model, termite_plot, output_file, r_ldavis, json_prefix, seed, **kwargs) 116 117 if r_ldavis: --> 118 to_r_ldavis(processed_data, dir_name=os.path.join(dir_path, 'ldavis'), lda=lda) 119 os.environ["LDAVIS_DIR"] = os.path.join(dir_path, 'ldavis') 120 try: /Users/alexmckenzie/anaconda/lib/python2.7/site-packages/topik-0.1.0-py2.7.egg/topik/utils.pyc in to_r_ldavis(corpus_bow, lda, dir_name) 40 np.savetxt(os.path.join(dir_name, 'topicTermDist'), tt_dist, delimiter=',', newline='\n',) 41 ---> 42 corpus_file = corpus_bow.filename 43 corpus = gensim.corpora.MmCorpus(corpus_file) 44 docTopicProbMat = lda.model[corpus] AttributeError: 'DigestedDocumentCollection' object has no attribute 'filename'

msarahan commented 9 years ago

Hi, your traceback seems to show an old version of topik (0.1.0). The current version is 0.2.0, and we did a lot of work on this particular area. Where did you download Topik from? It is possible that you got a stale version somehow.

AHMcKenzie commented 9 years ago

Thanks for following up! I used conda as suggested on the install page: http://topik.readthedocs.org/en/latest/installation.html but also discovered I have an older version. I'll clean this up and try again. Thanks again. topik duplicates

AHMcKenzie commented 9 years ago

Msarahan,

With the proper Topik version the sample demo worked fine. However I get the error below when trying to obtain a ldavis plot. I noticed that this was flagged a couple of days ago, so I'll wait to see what's the outcome of that fix. Thanks and regards

ValidationError Traceback (most recent call last)

in () ----> 1 plot_lda_vis(model.to_py_lda_vis()) /Users/alexmckenzie/anaconda/lib/python2.7/site-packages/topik/viz.pyc in plot_lda_vis(model_data) 65 """Designed to work with to_py_lda_vis() in the model classes.""" 66 from pyLDAvis import prepare, show ---> 67 model_vis_data = prepare(**model_data) 68 show(model_vis_data) /Users/alexmckenzie/anaconda/lib/python2.7/site-packages/pyLDAvis/_prepare.pyc in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts) 277 doc_lengths = _series_with_name(doc_lengths, 'doc_length') 278 vocab = _series_with_name(vocab, 'vocab') --> 279 _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency) 280 R = min(R, len(vocab)) 281 /Users/alexmckenzie/anaconda/lib/python2.7/site-packages/pyLDAvis/_prepare.pyc in _input_validate(_args) 57 res = _input_check(_args) 58 if res: ---> 59 raise ValidationError('\n' + '\n'.join([' \* ' + s for s in res])) 60 61 ValidationError: # \* Not all rows (distributions) in doc_topic_dists sum to 1.
msarahan commented 9 years ago

Thanks. I am going to repost this error message under that other issue, for tracking purposes.

AHMcKenzie commented 9 years ago

Hi Mike,

I understand that you’ll release version 0.3 in mid-November. I’m using conda to manage my environment, how can I install the current Github version of topik using conda, and test the recent changes? I’m not that familiar with these tools, so I greatly appreciate any advice you can offer. Thanks!

Alex

On Oct 12, 2015, at 8:22 AM, Mike Sarahan notifications@github.com wrote:

Thanks. I am going to repost this error message under that other issue, for tracking purposes.

— Reply to this email directly or view it on GitHub https://github.com/ContinuumIO/topik/issues/41#issuecomment-147380293.

ahmadia commented 9 years ago

@AHMcKenzie - conda itself doesn't install directly from GitHub, but if you have all of the other dependencies installed into an environment, you could try:

pip install https://github.com/Continuumio/topik/archive/master.zip
msarahan commented 9 years ago

I think perhaps the best approach is to do a git checkout of the topik source. From there, create the conda environment using conda-env as detailed in the docs. This should take care of dependencies. You'll have to activate the created environment before proceeding. Finally:

conda develop .

in the topik source root. This is roughly equivalent to

python setup.py develop

or

pip install -e .

I have used both of those also, and they work OK. The general idea is that instead of copying topik's files into your Python installation, you create a link in the Python installation.

Note that if you've installed topik from conda, you should remove it prior to doing this. The conda develop command needs work, and currently any conda-installed packages will take precedence over ones installed with conda develop.

On Wed, Oct 21, 2015 at 10:08 PM Aron Ahmadia notifications@github.com wrote:

@AHMcKenzie https://github.com/AHMcKenzie - conda itself doesn't install directly from GitHub, but if you have all of the other dependencies installed into an environment, you could try:

pip install https://github.com/Continuumio/topik/archive/master.zip

— Reply to this email directly or view it on GitHub https://github.com/ContinuumIO/topik/issues/41#issuecomment-150093633.

ahmadia commented 9 years ago

+1 on @msarahan's more detailed response.