Closed elshize closed 5 years ago
Unexpectedly (?), many scores are negative:
$ ~/irkit/build/bin/irk-postings obama --score *bm25 | head
570 -0.163575
662 -0.592886
663 -0.652508
664 -0.858155
665 -0.865461
666 -0.860576
667 -0.812432
668 -0.865461
669 -0.841578
It seems to be a problem with building index instead. Avg. document size is negative!
{
"avg_document_size": -21.23560605242698,
"documents": 37512555,
"max_document_size": 219400,
"occurrences": 29268169232,
"skip_block_size": 64
}
Problem is most likely with:
int64_t sum_doc_size = std::reduce(
std::execution::par_unseq, sizes.begin(), sizes.end(), 0);
The initial value must be 64-bit to avoid overflow.
This is a good opportunity to take care of #14
Suspicion confirmed.
Index
moa:/data/index/irkit/cw09b-nospam
Command
~/irkit/build/bin/irk-score
Log