KonradHoeffner / hdt

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
https://crates.io/crates/hdt
MIT License
19 stars 4 forks source link

Use new WaveletMatrix 0.0.6 construction method to reduce memory usage #16

Closed KonradHoeffner closed 1 year ago

KonradHoeffner commented 1 year ago

See https://github.com/kampersanda/sucds/issues/44. Using a modified hdt::tests which loads lscomplete20143.hdt and then returns.

Before

Command being timed: "cargo test --release hdt::tests"
User time (seconds): 89.53
System time (seconds): 7.13
Percent of CPU this job got: 225%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:42.95
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3370252
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 770
Minor (reclaiming a frame) page faults: 3897847
Voluntary context switches: 11918
Involuntary context switches: 12515
Swaps: 0
File system inputs: 2541256
File system outputs: 435912
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

After

Command being timed: "cargo test --release hdt::tests"
User time (seconds): 18.03
System time (seconds): 1.48
Percent of CPU this job got: 115%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:16.88
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3380940
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 773459
Voluntary context switches: 2180
Involuntary context switches: 654
Swaps: 0
File system inputs: 648120
File system outputs: 18624
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Not much change in the resident set size, however it is also much lower than expected, is /usr/bin/time not accurate? Try with heaptrack instead.