Closed youngblood closed 9 years ago
I think this issue is related (reported by @AHMcKenzie):
With the proper Topik version the sample demo worked fine. However I get the error below when trying to obtain a ldavis plot. I noticed that this was flagged a couple of days ago, so I'll wait to see what's the outcome of that fix. Thanks and regards
ValidationError Traceback (most recent call last) in () ----> 1 plot_lda_vis(model.to_py_lda_vis())
/Users/alexmckenzie/anaconda/lib/python2.7/site-packages/topik/viz.pyc in plot_lda_vis(model_data) 65 """Designed to work with to_py_lda_vis() in the model classes.""" 66 from pyLDAvis import prepare, show ---> 67 model_vis_data = prepare(**model_data) 68 show(model_vis_data)
/Users/alexmckenzie/anaconda/lib/python2.7/site-packages/pyLDAvis/_prepare.pyc in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts) 277 doc_lengths = _series_with_name(doc_lengths, 'doc_length') 278 vocab = _series_with_name(vocab, 'vocab') --> 279 _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency) 280 R = min(R, len(vocab)) 281
/Users/alexmckenzie/anaconda/lib/python2.7/site-packages/pyLDAvis/_prepare.pyc in _input_validate(args) 57 res = _input_check(args) 58 if res: ---> 59 raise ValidationError('\n' + '\n'.join([' * ' + s for s in res])) 60 61
ValidationError:
Currently I think we are using 'the number of documents in which a term appears'. I think we should instead be using 'total number of occurrences of a term in the entire corpus'. Ideally this value will be calculated once and then stored for each term in the intermediate data store.