diging / tethne

Python module for bibliographic network analysis.
http://diging.github.io/tethne/
GNU General Public License v3.0
81 stars 32 forks source link

Using Mallet file as corpus for the LDA model #149

Closed pucek80 closed 8 years ago

pucek80 commented 8 years ago

@erickpeirson ,

I am wondering if it is possible to use the file created via Mallet's import-dir function as corpus object for LDAModel? I've been going through the topic models visualization tutorial and that is where I am stuck. I have textual data in multiple files. With the more recent improvement to Tethne, it appears that the older tutorials don't work anymore. I will greatly appreciate any help!

erickpeirson commented 8 years ago

@pucek80 Apologies for the delayed response. Two quick replies:

Reply 1. The docs are horribly out of date, and I am just this moment sprinting to update them. Give me a few days, and I'll update this thread when the updated docs are available (including the LDA tutorial).

Reply 2. Hmmm... that's a bit of a different use-case than we have previously had in mind. In most use-cases that we envisioned, the user would start with heterogenous data and metadata, use Tethne to bring those together into a corpus with structured feature data (e.g. wordcounts), and then let Tethne interface with MALLET -- in other words, intervene before your import-dir step.

So, I guess there are two routes we can take here. These are not mutually exclusive, but it would be helpful to hear your reaction.

  1. We can make it easier for you to encapsulate your textual data in a Corpus, so that Tethne can handle things as I've described above, or
  2. We can add a method to LDAModel that allows you to start mid-stream, i.e. provide the path of the MALLET-serialized Corpus, and go from there.

Thoughts?

erickpeirson commented 8 years ago

Well, that was more than a few days, but here's the updated LDA tutorial: http://diging.github.io/tethne/tutorial.mallet.html

@pucek80 Any thoughts on which strategy would be most helpful?

erickpeirson commented 8 years ago

For cross-reference, this is related to #122

erickpeirson commented 8 years ago

@pucek80 Take a look at this thread. Does that help at all?

erickpeirson commented 8 years ago

Closing for now. Please re-open if this is not resolved.

pucek80 commented 7 years ago

Erick, thank you for all the updates. I was sidetracked with other projects and never really got a chance to get back to this one. I will definitely take a look and let you know.