KonradHoeffner / hdt

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
https://crates.io/crates/hdt
MIT License
19 stars 4 forks source link

regression in loading times? #23

Closed KonradHoeffner closed 1 year ago

KonradHoeffner commented 1 year ago

Using a modified hdt::tests which loads lscomplete20143.hdt and then returns.

0.0.7: 4.541s 0.0.8: 47.16s

KonradHoeffner commented 1 year ago

Caused by this line in triples.rs TriplesBitmap::read():

indices.sort_unstable_by(|a, b| get_p(*a).cmp(&get_p(*b)));
KonradHoeffner commented 1 year ago

Using indices.sort_unstable_by_key(|a| get_p(*a)); with 0.0.8: 47.23s. Maximum resident set size (kbytes): 3260396

Key caching gets it down to 21.38s:

indices.sort_by_cached_key(|a| get_p(*a));

Maximum resident set size (kbytes): 3292092

So no large memory overhead, use this for now and investigate later whether it can be further optimized.