dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.41k stars 488 forks source link

Truncate MCVs #8041

Closed max-hoffman closed 3 weeks ago

max-hoffman commented 3 weeks ago

Sort and truncate MCVs. Only keep values whose frequency is > twice the uniform frequency. This prevents us from manually summing non-outliers (which is expensive).

max-hoffman commented 3 weeks ago

benchmark

github-actions[bot] commented 3 weeks ago

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9602745041

coffeegoddd commented 3 weeks ago

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
0bc232f ok 5937457
version total_tests
0bc232f 5937457
correctness_percentage
100.0
coffeegoddd commented 3 weeks ago
@max-hoffman DOLT test_name from_latency_p95 to_latency_p95 is_faster
tpcc-scale-factor-1 74.46 74.46 0
test_name server_name server_version tps test_name server_name server_version tps is_faster
tpcc-scale-factor-1 dolt b4dc47360dc0d5235634c3e5e219110d5d1a1ddf 33.54 tpcc-scale-factor-1 dolt 0bc232f5392a6fed47f6f7d6cc3d143b428cdc79 32.85 0
coffeegoddd commented 3 weeks ago

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
39bcc83 ok 5937457
version total_tests
39bcc83 5937457
correctness_percentage
100.0
coffeegoddd commented 3 weeks ago
@max-hoffman DOLT read_tests from_latency_median to_latency_median is_faster
covering_index_scan 2.81 2.81 0
groupby_scan 17.01 17.32 0
index_join 5.28 5.28 0
index_join_scan 3.25 2.52 1
index_scan 53.85 53.85 0
oltp_point_select 0.46 0.46 0
oltp_read_only 7.56 7.56 0
select_random_points 0.75 0.75 0
select_random_ranges 0.9 0.9 0
table_scan 54.83 54.83 0
types_table_scan 139.85 139.85 0
write_tests from_latency_median to_latency_median is_faster
oltp_delete_insert 6.09 6.09 0
oltp_insert 3.02 3.02 0
oltp_read_write 13.95 13.95 0
oltp_update_index 3.07 3.07 0
oltp_update_non_index 3.02 3.02 0
oltp_write_only 6.43 6.43 0
types_delete_insert 6.67 6.67 0
coffeegoddd commented 3 weeks ago

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c9875b3 ok 5937457
version total_tests
c9875b3 5937457
correctness_percentage
100.0
coffeegoddd commented 3 weeks ago

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
ba8e7f4 ok 5937457
version total_tests
ba8e7f4 5937457
correctness_percentage
100.0