dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.93k stars 509 forks source link

When removing from a secondary index during merging, use the pre-merge ordinal mapping, not the merged mapping, to identify the secondary key to be removed. #8154

Closed nicktobey closed 3 months ago

nicktobey commented 3 months ago

Basically, we have a bug where dropping a column on the remote side of a merge can interfere with updating secondary indexes.

Example:

Base Schema: (pk INT PRIMARY KEY, a TINYINT, b INT, UNIQUE KEY b_idx (b))

"theirs" drops column a

Merged Schema: (pk INT PRIMARY KEY, b INT, UNIQUE KEY b_idx (b))

In effect, the merger would see that in the final table, b is the second column. Then, when updating b_idx for each resolved row, it would use the second column of "ours" to find the index entry to remove. But this is incorrect, because the second column of "ours" is a.

If the user is lucky, these two columns will be different sizes and the merger will panic. But if the two columns are the same size, merge proceeds with an incorrect value. This will cause it to either fail to remove the old row from the secondary index, or remove a different row. Either way, the secondary index is now incorrect.

coffeegoddd commented 3 months ago

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
c189d9e ok 5937457
version total_tests
c189d9e 5937457
correctness_percentage
100.0
coffeegoddd commented 3 months ago

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
bf0e938 ok 5937457
version total_tests
bf0e938 5937457
correctness_percentage
100.0
coffeegoddd commented 3 months ago

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
b1d6ae0 ok 5937457
version total_tests
b1d6ae0 5937457
correctness_percentage
100.0