Closed ghost closed 8 years ago
@epipremnum Some of those notebooks are hopelessly out of date -- I'll take a look soon and see what's going on. We are in bad need of an update to the documentation and examples. :-P
Hi Erick,
thanks for your reply. I figured it out.
If i only have word counts for the abstract field, do i need a StructuredFeatureSet like in the example? It works for me if i set structured=False.
Thank you.
@epipremnum If you're just topic modeling, structured=False
is the way to go.
But it is odd that transform()
is not working on the StructuredFeatureSet
. I have created TETHNE-126 for this. When you have a moment, can you update this thread with the version of Tethne that you are using? (i.e. pip show tethne
)
Ok, I think that I see what happened here. In StructuredFeatureSet
the transform()
method was checking explicitly for None
to exclude, and letting False
slip by. In FeatureSet
, transform()
just checks for Falsiness. Fix forthcoming.
Ok, this is fixed in 1b70d10, and will make it into v0.8.1-beta.
@epipremnum If you're still using Tethne, it would be great to get your help building our new Q/A group (here). I'm hoping that this can help make up for my slow pace on documentation. Thanks!
Hi,
i refer to your Notebook 6. Words and topic modeling.ipynb
I tried to follow your code and use it for my own WoS-Corpus. You create a FeatureSet for the abstract field. Then you apply a filter with transform() on the FeatureSet. You want to remove stopwords from stoplist and words with a document frequency between 2 and 400.
But both FeatureSets have the same length? It looks like there were no tokens removed. Is there an error in the filter or am i missing something?
I removed the stopwords beforehand. But the document frequency filtering doesnt seem to work.![screen](https://cloud.githubusercontent.com/assets/18701898/15431971/8c353e0c-1eac-11e6-85ce-74ea6f8d1fd0.jpeg)
Also could you explain the mentioned abstract_to_features() method? I can't seem to find it.
Thank you.