cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

sql: reduce allocs in execStmtInOpenState [~5% allocs] #135908

Open tbg opened 9 hours ago

tbg commented 9 hours ago

Under sysbench oltp_read_only this shows up very prominently, as one of the top hitters in peek:

Type: alloc_objects
Time: 2024-11-21 13:25:41 CET
Duration: 30.17s, Total samples = 158593784 
Showing nodes accounting for 158593784, 100% of 158593784 total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context          
----------------------------------------------------------+-------------
                                           2031648   100% |   github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt.func1 pkg/sql/conn_executor_exec.go:141
   2031648  1.28%  9.75%    2031648  1.28%                | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState pkg/sql/conn_executor_exec.go:275
----------------------------------------------------------+-------------
image

I'm not actually sure which one is right (does this function allocate 2031648 times or 5072675 times which is 2.5x of the former?) but it's clear that it allocates all over the place. Summing up the "Total:" lines in the listing[^sum] gives 5807416 which is yet another number, but it's close to the latter. So, we think this is around 2.5*1.28% or around 5%! Much of that seems avoidable at first glance and someone should spend some quality time with this code.

[^sum]: $ grep 'Total' src.txt | awk '{ print $2 }' | grep -E '[0-9]+' | perl -lne '$x += $_; END { print $x; }' 5807416

Even its three return parameters leak to the heap (because they are named and get passed to all kinds of functions inside of execStmtInOpenState). You can see this here alongside all the other allocations.

I was so confused by this that I used scripts/goescape which also told me that the ctx arg gets moved to the heap:

./conn_executor_exec.go:276:2: moved to heap: ctx
./conn_executor_exec.go:282:4: moved to heap: retEv
./conn_executor_exec.go:282:21: moved to heap: retPayload
./conn_executor_exec.go:282:50: moved to heap: retErr

presumably because ctx is leaked into the portal here.

Epic: CRDB-42584

Jira issue: CRDB-44787