Add sampling of n segments from each document

cligs / pyzeta

Python implementation of the Zeta score for contrastive text analysis

GNU General Public License v3.0

14 stars 6 forks source link

Add sampling of n segments from each document #9

Closed christofs closed 7 years ago

christofs commented 7 years ago

This is an idea of AZ. Make sure that longer documents don't skew the results by sampling a fix number of segments from each document. In this way, each document will have an equal weight in the overall Zeta score of a group of texts. It does mean some loss of information, because more text is discarded.

christofs commented 7 years ago

Done with commit b1f416d