This PR upgrades LabeledSpanLengthCollector to SpanLengthCollector:
make labels optional and inferrable,
add tokenization capability
IMPORTANT NOTE (similar to #351): Inferring labels produces wrong results for certain aggregation_functions such as min, mean, and std because documents with zero entries of a certain label are not considered anymore for that label. We remove these from aggregation_functions if labels == "INFERRED", but we can not handle any user defined function (which relies on correct zero values).
This PR upgrades
LabeledSpanLengthCollector
toSpanLengthCollector
:IMPORTANT NOTE (similar to #351): Inferring labels produces wrong results for certain
aggregation_functions
such asmin
,mean
, andstd
because documents with zero entries of a certain label are not considered anymore for that label. We remove these fromaggregation_functions
iflabels == "INFERRED"
, but we can not handle any user defined function (which relies on correct zero values).