dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.58k stars 498 forks source link

Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

Closed nicktobey closed 1 month ago

nicktobey commented 1 month ago

This PR includes a new implementation of the JSON_SET and JSON_REPLACE functions that leverage the new indexed JSON storage format.

For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

coffeegoddd commented 1 month ago

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
d7cb874 ok 5937457
version total_tests
d7cb874 5937457
correctness_percentage
100.0
github-actions[bot] commented 1 month ago
@coffeegoddd DOLT test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 1.8
batching batch sql 10000 1 0.07 1.86
batching by line sql 10000 1 0.07 2
blob 1 blob 200000 1 0.89 3.96 3.88
blob 2 blobs 200000 1 0.87 4.57 4.55
blob no blob 200000 1 0.9 2.48 2.14
col type datetime 200000 1 0.82 3.01 2.87
col type varchar 200000 1 0.68 3.57 3.06
config width 2 cols 200000 1 0.78 2.6 2.26
config width 32 cols 200000 1 1.84 2.02 2.49
config width 8 cols 200000 1 0.96 2.46 2.27
pk type float 200000 1 0.84 2.7 2.1
pk type int 200000 1 0.82 2.48 2.26
pk type varchar 200000 1 1.64 1.77 1.38
row count 1.6mm 1600000 1 5.69 2.96 2.55
row count 400k 400000 1 1.41 2.94 2.5
row count 800k 800000 1 2.8 2.98 2.56
secondary index four index 200000 1 3.46 1.47 1.13
secondary index no secondary 200000 1 0.88 2.52 2.18
secondary index one index 200000 1 1.13 2.43 2.12
secondary index two index 200000 1 1.96 1.82 1.49
sorting shuffled 1mm 1000000 0 5.27 2.79 2.5
sorting sorted 1mm 1000000 1 5.2 2.84 2.52
github-actions[bot] commented 1 month ago
@coffeegoddd DOLT name detail mean_mult
dolt_blame_basic system table 1.24
dolt_blame_commit_filter system table 3.42
dolt_commit_ancestors_commit_filter system table 0.87
dolt_commits_commit_filter system table 0.97
dolt_diff_log_join_from_commit system table 2.05
dolt_diff_log_join_to_commit system table 2.01
dolt_diff_table_from_commit_filter system table 1.04
dolt_diff_table_to_commit_filter system table 1.14
dolt_diffs_commit_filter system table 0.95
dolt_history_commit_filter system table 1.19
dolt_log_commit_filter system table 0.92
github-actions[bot] commented 1 month ago
@coffeegoddd DOLT name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 1.38
adds_updates_deletes 60000 60000 60000 4.47
deletes_only 0 60000 0 2.45
updates_only 0 0 60000 3.06