dolthub / dolt

Dolt – Git for Data
https://www.dolthub.com
Apache License 2.0
18k stars 516 forks source link

Creating two tables with the same schema and database, dropping one of them, and renaming the second table to the dropped table creates an entry in status but not diff #8607

Open tbantle22 opened 4 days ago

tbantle22 commented 4 days ago

My guess is that because the underlying table hashes of these tables are different that it's creating an entry in status, but because there no schema or data changes nothing is showing up in any of the diff functions (although the user did see the table show up as modified in the diff on the dolt workbench without showing and data/schema changes but I could not reproduce that).

Here's a repro:

mysql> create table t1 (id int primary key);
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t1 values (1);
Query OK, 1 row affected (0.05 sec)

mysql> call dolt_commit('-Am', 'create table t1');
+----------------------------------+
| hash                             |
+----------------------------------+
| cd355lu9gkd7p8ihdcb1e72dafel6o6u |
+----------------------------------+
1 row in set (0.05 sec)

mysql> select dolt_hashof_table('t1');
+----------------------------------+
| dolt_hashof_table('t1')          |
+----------------------------------+
| u5jjak18av7dm38b296vecl4mvju9oru |
+----------------------------------+
1 row in set (0.04 sec)

mysql> create table temp__t1 (id int primary key);
Query OK, 0 rows affected (0.05 sec)

mysql> insert into temp__t1 values (1);
Query OK, 1 row affected (0.05 sec)

mysql> select dolt_hashof_table('temp__t1');
+----------------------------------+
| dolt_hashof_table('temp__t1')    |
+----------------------------------+
| j48bj0o517b22fa720rb6sgp238fltof |
+----------------------------------+
1 row in set (0.04 sec)

mysql> drop table t1;
Query OK, 0 rows affected (0.04 sec)

mysql> rename table temp__t1 to t1;
Query OK, 0 rows affected (0.05 sec)

mysql> select dolt_hashof_table('t1');
+----------------------------------+
| dolt_hashof_table('t1')          |
+----------------------------------+
| j48bj0o517b22fa720rb6sgp238fltof |
+----------------------------------+
1 row in set (0.04 sec)

mysql> select * from dolt_status;
+------------+--------+----------+
| table_name | staged | status   |
+------------+--------+----------+
| t1         |      0 | modified |
+------------+--------+----------+
1 row in set (0.04 sec)

mysql> select * from dolt_diff('HEAD..WORKING', 't1');
Empty set (0.04 sec)

mysql> select * from dolt_schema_diff('HEAD..WORKING', 't1');
Empty set (0.04 sec)

mysql> select * from dolt_diff_summary('HEAD..WORKING');
Empty set, 1 warning (0.04 sec)

For context, this user is using dbt to do data ETL work, they may delete the original table while ensuring that the new temporary data is created successfully, and then rename the new table to the original table, creating the same table over and over again (which may or may not change at all), and this kind of "modified" can cause confusion

nicktobey commented 4 days ago

I was curious why two tables with the same schema and same data would have different hashes, so I reproduced it and inspected the chunks:

It's because of tags (again). Every column in the db is currently assigned a unique tag, and this tag is encoded in the schema (and thus affects the schema hash)