attic-labs / noms

The versioned, forkable, syncable database
Apache License 2.0
7.45k stars 266 forks source link

Change List.IterAll to read-ahead 8MB chunks over 6 threads #3729

Closed kalman closed 6 years ago

kalman commented 6 years ago

This copies the approach that Blob.Copy uses, with some extra logic to adjust the number of values to read as more values are read.

kalman commented 6 years ago

This isn't to submit yet, because it hardly makes a difference in csv-export. @rafael-atticlabs could you check it's sensible?

Here is what the csv exported used to do (400MB of csv data):

> go install ./samples/go/csv/... && (time bin/csv-export db::csv) >/dev/null

real    0m14.539s
user    0m22.550s
sys 0m4.139s

and with my changes:

> go install ./samples/go/csv/... && (time bin/csv-export db::csv) >/dev/null
read 1000 chunks (345 kB)
adjusted chunk size from 1000 to 24301 (scaled by 24.301)
read 1000 chunks (355 kB)
adjusted chunk size from 1000 to 23600 (scaled by 23.600)
read 1000 chunks (371 kB)
adjusted chunk size from 1000 to 22620 (scaled by 22.621)
read 1000 chunks (351 kB)
adjusted chunk size from 1000 to 23899 (scaled by 23.899)
read 1000 chunks (364 kB)
adjusted chunk size from 1000 to 23054 (scaled by 23.054)
read 1000 chunks (318 kB)
adjusted chunk size from 1000 to 26354 (scaled by 26.354)
read 1000 chunks (338 kB)
adjusted chunk size from 1000 to 24821 (scaled by 24.821)
read 1000 chunks (346 kB)
adjusted chunk size from 1000 to 24280 (scaled by 24.280)
read 26354 chunks (8.4 MB)
read 24280 chunks (7.7 MB)
adjusted chunk size from 24280 to 26305 (scaled by 1.083)
read 24280 chunks (7.7 MB)
adjusted chunk size from 24280 to 26392 (scaled by 1.087)
read 24280 chunks (7.7 MB)
adjusted chunk size from 24280 to 26331 (scaled by 1.084)
read 24280 chunks (7.7 MB)
adjusted chunk size from 24280 to 26286 (scaled by 1.083)
read 24280 chunks (7.8 MB)
adjusted chunk size from 24280 to 26246 (scaled by 1.081)
read 24280 chunks (7.9 MB)
adjusted chunk size from 24280 to 25825 (scaled by 1.064)
read 24280 chunks (7.8 MB)
adjusted chunk size from 24280 to 26189 (scaled by 1.079)
read 25825 chunks (8.3 MB)
adjusted chunk size from 25825 to 26240 (scaled by 1.016)
read 26189 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.3 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.3 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.3 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.3 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.4 MB)
read 26240 chunks (8.3 MB)
read 23945 chunks (7.6 MB)

real    0m14.637s
user    0m22.263s
sys 0m4.393s
kalman commented 6 years ago

final observations before I dig into this again:

kalman commented 6 years ago

latest patch is simpler/better but marshal test still fails, and csv export is unaffected (in fact it might be slower).

kalman commented 6 years ago

There we go.

kalman commented 6 years ago

OK, @rafael-atticlabs (and @arv) PTAL.