dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.36k stars 488 forks source link

[rowexec] custom rowexec #8072

Open max-hoffman opened 1 week ago

max-hoffman commented 1 week ago

This PR adds custom Dolt execution operators for lookup joins. When building an execution plan, we try to replace joinIter with a Dolt equivalent that inlines the key building and map get. This is a lot faster because repeatedly building the secondary iterator and materializing sql.Rows in-between lookups are expensive. The cases where we can use map.Get instead of map.PrefixGet for strict key lookups will also perform fewer chunkstore reads.

This PR moves filters in join children to after materializing lookup join rows.

Barring any correctness issues I might be overlooking, ths prototype brings index_join from 5.18 ms/query to 2.64 ms/q, which will be 2.0x MySQL's latency.

Gaps before merging this:

max-hoffman commented 1 week ago

benchmark

github-actions[bot] commented 1 week ago

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9682864996

coffeegoddd commented 1 week ago
@max-hoffman DOLT test_name from_latency_p95 to_latency_p95 is_faster
tpcc-scale-factor-1 73.13 89.16 0
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt f7abadf73e214ebf8c1ffdcde0d48ea1c505d383 32.99 tpcc-scale-factor-1 dolt c531bcef84b80da341a5720ae526abd667bd7dd5 13.52 1
coffeegoddd commented 1 week ago
@max-hoffman DOLT read_tests from_latency_median to_latency_median is_faster
covering_index_scan 2.76 2.81 0
groupby_scan 17.32 17.32 0
index_join 5.28 2.61 1
index_join_scan 2.57 2.57 0
index_scan 53.85 54.83 0
oltp_point_select 0.46 0.46 0
oltp_read_only 7.56 7.7 0
select_random_points 0.77 0.77 0
select_random_ranges 0.92 0.92 0
table_scan 54.83 55.82 0
types_table_scan 142.39 142.39 0
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.09 6.09 0
oltp_insert 3.02 3.02 0
oltp_read_write 14.21 14.21 0
oltp_update_index 3.13 3.07 0
oltp_update_non_index 3.02 3.02 0
oltp_write_only 6.43 6.43 0
types_delete_insert 6.67 6.67 0
github-actions[bot] commented 1 week ago

This PR is being tested for SQL correctness. Please allow ~25 mins for this to complete. If this PR does not result in a SQL correctness regression, the correctness_approved label will be automatically added to this PR and the Check for correctness_approved workflow will succeed.

github-actions[bot] commented 1 week ago

Additional work is required for integration with DoltgreSQL.

max-hoffman commented 1 week ago

benchmark

github-actions[bot] commented 1 week ago

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9687941574

coffeegoddd commented 1 week ago
@max-hoffman DOLT test_name from_latency_p95 to_latency_p95 is_faster
tpcc-scale-factor-1 74.46 78.6 0
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt 0c4b3f956c2b1f19392c6a688496add36ec521ff 32.94 tpcc-scale-factor-1 dolt ef8cc8b82f06f9287c74bd5739935bc44df1befc 32.89 0
coffeegoddd commented 1 week ago
@max-hoffman DOLT read_tests from_latency_median to_latency_median is_faster
covering_index_scan 2.86 2.86 0
groupby_scan 17.01 17.32 0
index_join 5.28 2.66 1
index_join_scan 2.52 2.57 0
index_scan 53.85 53.85 0
oltp_point_select 0.44 0.46 0
oltp_read_only 7.43 7.56 0
select_random_points 0.73 0.74 0
select_random_ranges 0.87 0.89 0
table_scan 54.83 54.83 0
types_table_scan 139.85 142.39 0
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.09 6.09 0
oltp_insert 2.97 3.02 0
oltp_read_write 13.7 13.95 0
oltp_update_index 3.07 3.07 0
oltp_update_non_index 3.02 3.02 0
oltp_write_only 6.32 6.43 0
types_delete_insert 6.55 6.67 0