Open asfimport opened 3 years ago
Dawid Weiss (@dweiss) (migrated from JIRA)
It could be hotspot noise maybe?
Could be. Or it could be something else running in the background? It'd be good to somehow monitor background CPU activity while these benchmarks are being made. I'm not much of a sysop to help out here though.
Michael McCandless (@mikemccand) (migrated from JIRA)
Yeah that is one possible theory, but, this machine (dedicated physical box) is very idle and only runs Lucene's nightly benchmarks. Also, the other benchmarks run on those same timestamps (e.g. the other analyzers) did not also seem to show a performance drop. So I think it is not likely a time specific environmental issue ...
Dawid Weiss (@dweiss) (migrated from JIRA)
It's one of those things that are exciting to debug, take days to complete and sometimes never reach any reasonable explanation. :)
Michael McCandless (@mikemccand) (migrated from JIRA)
It's one of those things that are exciting to debug, take days to complete and sometimes never reach any reasonable explanation. :)
LOL I fear you have already handled too many such cases!
Michael Sokolov (@msokolov) (migrated from JIRA)
Maybe something like https://github.com/mikemccand/luceneutil/issues/77 would help
With the recent accidental regression of Japanese (Kuromoji) tokenization throughput due to exciting FST optimizations, we added new nightly Lucene benchmarks to measure tokenization throughput for
JapaneseTokenizer
: https://home.apache.org/\~mikemccand/lucenebench/analyzers.htmlIt has already been running for \5-6 weeks now! But for some reason, it looks bi-modal? "Normally" it is \.45 M tokens/sec, but for two data points it dropped down to \~.33 M tokens/sec, which is odd. It could be hotspot noise maybe? But would be good to get to the root cause and fix it if possible.
Hotspot noise that randomly steals \~27% of your tokenization throughput is no good!!
Or does anyone have any other ideas of what could be bi-modal in Kuromoji? I don't think this performance test has any randomness in it...
Migrated from LUCENE-9457 by Michael McCandless (@mikemccand), updated Aug 15 2020