This is an idea of AZ. Make sure that longer documents don't skew the results by sampling a fix number of segments from each document. In this way, each document will have an equal weight in the overall Zeta score of a group of texts. It does mean some loss of information, because more text is discarded.
This is an idea of AZ. Make sure that longer documents don't skew the results by sampling a fix number of segments from each document. In this way, each document will have an equal weight in the overall Zeta score of a group of texts. It does mean some loss of information, because more text is discarded.