Closed jbowens closed 1 year ago
I downloaded the artifacts from an arbitrary date before the 16th and an arbitrary date after the 16th. Diffing the heap profiles:
go tool pprof -diff_base ../artifacts\ \(8\)/5.profiles_F/heap.prof -diff_base ../artifacts\ \(8\)/4.profiles_F/heap.prof -diff_base ../artifacts\ \(8\)/3.profiles_F/heap.prof -diff_base ../artifacts\ \(8\)/2.profiles_F/heap.prof -diff_base ../artifacts\ \(8\)/1.profiles_F/heap.prof --normalize ../artifacts\ \(7\)/4.profiles_F/heap.prof ../artifacts\ \(7\)/3.profiles_F/heap.prof ../artifacts\ \(7\)/2.profiles_F/heap.prof ../artifacts\ \(7\)/1.profiles_F/heap.prof
File: pebble
Build ID: 671855946c18cea2728f4e13a9784294e95304ed
Type: inuse_space
Time: May 11, 2023 at 10:22am (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) alloc_objects
(pprof) top20
Showing nodes accounting for 18045094, 25.47% of 70839746 total
Showing top 20 nodes out of 577
flat flat% sum% cum cum%
44212530 62.41% 62.41% 44212530 62.41% github.com/cockroachdb/pebble/internal/manifest.(*L0Sublevels).addFileToSublevels
-3660821 5.17% 57.24% -3660821 5.17% github.com/cockroachdb/pebble/sstable.(*Properties).saveUvarint (inline)
-2458646 3.47% 53.77% -2458646 3.47% github.com/cockroachdb/pebble/sstable.(*rawBlockIter).readEntry
-1873472 2.64% 51.13% -1873472 2.64% github.com/HdrHistogram/hdrhistogram-go.New
-1593315 2.25% 48.88% -1593315 2.25% os.newFile
-1548084 2.19% 46.69% -4180215 5.90% github.com/cockroachdb/pebble/sstable.(*Properties).load
-1501729 2.12% 44.57% -1501729 2.12% strings.(*Builder).grow (inline)
-1478203 2.09% 42.49% -1478203 2.09% github.com/cockroachdb/pebble/sstable.(*Properties).saveString (inline)
-1324987 1.87% 40.62% -1324987 1.87% runtime/pprof.allFrames
-1242971 1.75% 38.86% -1242971 1.75% syscall.ByteSliceFromString
-1170309 1.65% 37.21% -1193454 1.68% fmt.Sprintf
-1152856 1.63% 35.58% -4961799 7.00% github.com/cockroachdb/pebble/internal/base.MakeFilepath
-1112943 1.57% 34.01% -2307214 3.26% github.com/cockroachdb/pebble/internal/base.MakeFilename
-1007324 1.42% 32.59% -7265232 10.26% github.com/cockroachdb/pebble/sstable.(*Properties).save
-982504 1.39% 31.20% -1282321 1.81% runtime/pprof.WithLabels
-918369 1.30% 29.91% -918369 1.30% github.com/cockroachdb/pebble/vfs.wrapOSFileImpl
-869147 1.23% 28.68% -1441908 2.04% github.com/cockroachdb/pebble/sstable.(*rawBlockWriter).add
-828034 1.17% 27.51% -9661022 13.64% github.com/cockroachdb/pebble.(*tableCacheShard).findNode
-812471 1.15% 26.36% -30127363 42.53% github.com/cockroachdb/pebble.(*DB).runCompaction
-631251 0.89% 25.47% -631251 0.89% github.com/cockroachdb/pebble/internal/manifest.overlaps
Not sure what to make of that.
At the end of an old run:
__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
WAL 1 51 M - 148 G - - - - 149 G - - - 1.0
0 1684 1.8 G 1.97 149 G 0 B 0 0 B 0 146 G 112 K 0 B 33 1.0
1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0
2 277 1.0 G 2.17 85 G 0 B 0 0 B 0 158 G 40 K 158 G 1 1.8
3 695 3.3 G 2.17 143 G 0 B 0 3.6 G 1.1 K 293 G 68 K 293 G 1 2.1
4 1177 10 G 2.17 138 G 0 B 0 5.6 G 2.0 K 346 G 54 K 346 G 1 2.5
5 1974 33 G 2.17 130 G 0 B 0 3.3 G 1.1 K 349 G 29 K 349 G 1 2.7
6 2395 106 G - 102 G 0 B 0 0 B 0 274 G 9.4 K 275 G 1 2.7
total 8202 156 G - 149 G 0 B 0 12 G 4.2 K 1.7 T 313 K 1.4 T 38 11.5
flush 2415 0 B 0 0 (ingest = tables-ingested, move = ingested-as-flushable)
compact 71636 151 G 236 M 3 (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
ctype 67388 0 0 4248 0 0 0 (default, delete, elision, move, read, rewrite, multi-level)
memtbl 1 64 M
zmemtbl 0 0 B
ztbl 0 0 B
bcache 3.0 K 89 M 0.6% (score == hit-rate)
tcache 6 4.3 K 50.2% (score == hit-rate)
snaps 0 - 0 (score == earliest seq num)
titers 4
filter - - 0.0% (score == utility)
At the end of a recent run:
__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
WAL 3 149 M - 183 G - - - - 185 G - - - 1.0
0 42763 46 G 4.58 185 G 0 B 0 0 B 0 182 G 164 K 894 M 682 1.0
1 0 0 B 0.00 0 B 0 B 0 0 B 0 0 B 0 0 B 0 0.0
2 1801 9.2 G 4.58 78 G 0 B 0 36 K 1 522 G 109 K 522 G 1 6.7
3 2603 15 G 4.58 123 G 0 B 0 5.3 G 1.5 K 239 G 52 K 239 G 1 1.9
4 2366 23 G 4.58 106 G 0 B 0 8.1 G 2.2 K 239 G 38 K 239 G 1 2.3
5 2156 37 G 4.37 88 G 0 B 0 3.8 G 1.1 K 197 G 18 K 198 G 1 2.2
6 2042 60 G - 57 G 0 B 0 2.0 K 1 135 G 5.2 K 136 G 1 2.4
total 53731 190 G - 185 G 0 B 0 17 G 4.9 K 1.7 T 385 K 1.3 T 687 9.2
flush 2892 0 B 0 0 (ingest = tables-ingested, move = ingested-as-flushable)
compact 48354 494 G 496 M 3 (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
ctype 43480 0 0 4874 0 0 0 (default, delete, elision, move, read, rewrite, multi-level)
memtbl 3 192 M
zmemtbl 0 0 B
ztbl 0 0 B
bcache 17 K 497 M 0.7% (score == hit-rate)
tcache 284 209 K 50.2% (score == hit-rate)
snaps 0 - 0 (score == earliest seq num)
titers 19
filter - - 0.0% (score == utility)
ingest 0
Looks like we're "cheating" allowing much more accumulation of compaction debt. That explains https://github.com/cockroachdb/pebble/issues/2578#issuecomment-1572439183. Still don't understand why
This is the result of some sort of infrastructure change (slower disks maybe?). I re-ran the benchmark for a SHA from the beginning of May ( 5271a3c047 ):
1.ycsb_F.log:Benchmarkycsb/F/values=1024 186478597 34533.1 ops/sec 1413026152369 read 1814606526123 write 0.00 r-amp 8.91 w-amp
2.ycsb_F.log:Benchmarkycsb/F/values=1024 184113062 34095.0 ops/sec 1427230330097 read 1826109239561 write 0.00 r-amp 8.97 w-amp
3.ycsb_F.log:Benchmarkycsb/F/values=1024 180986452 33516.0 ops/sec 1433865066344 read 1823183619471 write 0.00 r-amp 9.24 w-amp
4.ycsb_F.log:Benchmarkycsb/F/values=1024 187536888 34729.0 ops/sec 1416099895729 read 1821872470357 write 0.00 r-amp 8.80 w-amp
5.ycsb_F.log:Benchmarkycsb/F/values=1024 180320385 33392.7 ops/sec 1440856033309 read 1829040634218 write 0.00 r-amp 9.29 w-amp
I'll add a benchmark annotation and close this out.
Running the older SHA with older versions of roachprod/roachtest produced the old results, so it does appear that something's changed in CockroachDB tooling. I'll create an issue in the Cockroach repository.
We don't currently understand where this improvement came from. We should investigate it, ensure that we're not doing anything silly, and see if there were any regressions fixed that should then be backported.