Open ebaggott opened 8 years ago
@ebaggott Sorry it has taken me a while to reply. Please stand by while I bring the docs up to date -- expect an update in this thread within the next few days.
This will be TETHNE-130
@ebaggott Just an update: DTM support was removed in v0.7, but I'm bringing it back for v0.8.1. I took it out because there was a nasty memory leak, but it looks like someone patched that so we're back in business. I'll let you know when it's ready to test.
In the meantime, can I get your help developing our new Q/A group, here? It would be great if you could create questions about things you've already figured out (and post your answer!), so that others who are getting started can benefit. I really appreciate it!
@ebaggott Ok, DTM is back! For now it's in v0.8.1.dev5, which means that you'll have to upgrade Tethne with the --pre flag:
pip install -U tethne --pre
The API is a little different (see the example in the docstring, below). It would be great to get your feedback on this, and any ideas for how to make it easier to use. I'll write up some better documentation soon.
Here's the docstring for reference:
Provides a wrapper for Dynamic Topic Model by David Blei et al [1][2].
In order to use this class you must have already compiled the ``dtm``
package by Blei and Gerrish, located
[here](https://github.com/blei-lab/dtm). If you run into memory issues you
may want to try [this fork](https://github.com/fedorn/dtm).
You must provide the path to the binary executable (usually called ``main``)
either by setting the DTM_PATH environment variable, or by passing
``dtm_path='/path/to/dtm/main'`` to the constructor.
[1] D. Blei and J. Lafferty. Dynamic topic models. In Proceedings of the
23rd International Conference on Machine Learning, 2006.
[2] S. Gerrish and D. Blei. A Language-based Approach to Measuring
Scholarly Impact. In Proceedings of the 27th International Conference on
Machine Learning, 2010.
Examples
--------
.. code-block:: python
>>> from tethne.readers.wos import read
>>> from nltk.tokenize import word_tokenize
>>> corpus = read('/path/to/my/data')
>>> corpus.index_feature('abstract', word_tokenize)
>>> from tethne import DTMModel
>>> model = DTMModel(corpus,
... featureset_name='abstract',
... dtm_path='/path/to/dtm/main')
>>> model.fit(Z=5)
In practice you will want to do some filtering prior to modeling.
Sorry, I know it's been at least a year since this was last looked at, but I was wondering if v0.8.1.dev5 is still the right version of tethne to use for DTM? I do see dtm.py and dtm.pyc in this version's model/corpus, but nothing in init.py, so as a result when I try to use dtm.from_gerrish I get
AttributeError: 'module' object has no attribute 'dtm'
Could you please help. Thanks!
Hello,
I'd like to use _tethne.model.corpus.DTMmodel.fromgerrish. However, I can't load the requisite modules. Can you help? This is what I have tried:
from tethne.model.corpus import dtmmodel from tethne.model.managers import DTMModelManager from tethne.model import DTMModelManager from tethne.model.corpus import dtmmodel
I always get the following import error:
ImportError: cannot import name DTMModelManager
Thanks!