diging / tethne

Python module for bibliographic network analysis.
http://diging.github.io/tethne/
GNU General Public License v3.0
81 stars 32 forks source link

Added new classes to support articulated content #86

Closed erickpeirson closed 9 years ago

erickpeirson commented 9 years ago

Added two new classes, classes.features.StructuredFeature and classes.features.StructuredFeatureSet.

A StructuredFeature represents articulated tokenized content associated with a single document. This might be the full-text content of a paper, where each work is a token. The StructuredFeature supports the definition of contexts, which divide the tokens up into chunks. For example, a document might be divided into pages, paragraphs, and/or sentences.

A StructuredFeatureSet is similar to the existing FeatureSet class, except that is designed specifically to support StructuredFeature instances. Note especially the context_chunks method, which generates a sparse representation of the entire featureset divided according to the selected context.