Closed deniszh closed 2 weeks ago
OK, probably I'll close this for now. Reasion - concurrent filewalk creates too much contention on trie index, which was not designed for parallel inserting. I tried to isolate it with mutex as a whole, but looks like it's too much contention. So, would put that aside, maybe return to it soon, when optimize indices
That's similar to my old PR https://github.com/go-graphite/go-carbon/pull/329/ but I'm using github.com/charlievieth/fastwalk which is still updating instead of cwalk which is 4 years old.
Why it's needed? On really big and powerful servers with many metrics filewalk is slow. I tried WalkDir - it's faster nowadays, but fastwalk is what really gives you performance gain.
For example, for little over 55M metrics, file_scan_runtime was 28302 seconds, after this change - 2069 seconds.
filewalk is sane with number of workers, it's minimum 4, then equal to numcpu but not more than 32.