datajoint / datajoint-python

Relational data pipelines for the science lab
https://datajoint.com/docs
GNU Lesser General Public License v2.1
168 stars 84 forks source link

`unite_master_parts` breaks topological order of descendants and thus breaks cascading of table drops #1057

Closed ZhuokunDing closed 1 week ago

ZhuokunDing commented 1 year ago

Bug Report

Description

unite_master_parts does not always preserve topological order when reordering the list of tables and causes issues when dropping tables.

Reproducibility

schema = dj.Schema('zhuokun_test')

@schema class A(dj.Manual): definition = """ a: int """

@schema class M1(dj.Manual): definition = """ -> A """

@schema class M(dj.Manual): definition = """ -> A """

class Part(dj.Part):
    definition = """
    -> master
    -> M1
    """

A.drop()

 - Logs:

[2022-09-30 21:17:15,746][INFO]: Connecting zhuokun@tutorial-db.datajoint.io:3306 [2022-09-30 21:17:15,864][INFO]: Connected zhuokun@tutorial-db.datajoint.io:3306 zhuokun_test.a (0 tuples) zhuokun_test.m (0 tuples) zhuokun_test.m__part (0 tuples) zhuokun_test.m1 (0 tuples) Proceed? [yes, No]: yes

OperationalError Traceback (most recent call last) Input In [1], in <cell line: 28>() 23 class Part(dj.Part): 24 definition = """ 25 -> master 26 -> M1 27 """ ---> 28 A.drop()

File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:646, in Table.drop(self) 644 if do_drop: 645 for table in reversed(tables): --> 646 FreeTable(self.connection, table).drop_quick() 647 print("Tables dropped. Restart kernel.")

File /opt/conda/lib/python3.9/site-packages/datajoint/table.py:605, in Table.drop_quick(self) 603 if self.is_declared: 604 query = "DROP TABLE %s" % self.full_table_name --> 605 self.connection.query(query) 606 logger.info("Dropped table %s" % self.full_table_name) 607 self._log(query[:255])

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:340, in Connection.query(self, query, args, as_dict, suppress_warnings, reconnect) 338 cursor = self._conn.cursor(cursor=cursor_class) 339 try: --> 340 self._execute_query(cursor, query, args, suppress_warnings) 341 except errors.LostConnectionError: 342 if not reconnect:

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:296, in Connection._execute_query(cursor, query, args, suppress_warnings) 294 cursor.execute(query, args) 295 except client.err.Error as err: --> 296 raise translate_query_error(err, query)

File /opt/conda/lib/python3.9/site-packages/datajoint/connection.py:294, in Connection._execute_query(cursor, query, args, suppress_warnings) 291 if suppress_warnings: 292 # suppress all warnings arising from underlying SQL library 293 warnings.simplefilter("ignore") --> 294 cursor.execute(query, args) 295 except client.err.Error as err: 296 raise translate_query_error(err, query)

File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:148, in Cursor.execute(self, query, args) 144 pass 146 query = self.mogrify(query, args) --> 148 result = self._query(query) 149 self._executed = query 150 return result

File /opt/conda/lib/python3.9/site-packages/pymysql/cursors.py:310, in Cursor._query(self, q) 308 self._last_executed = q 309 self._clear_result() --> 310 conn.query(q) 311 self._do_get_result() 312 return self.rowcount

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:548, in Connection.query(self, sql, unbuffered) 546 sql = sql.encode(self.encoding, "surrogateescape") 547 self._execute_command(COMMAND.COM_QUERY, sql) --> 548 self._affected_rows = self._read_query_result(unbuffered=unbuffered) 549 return self._affected_rows

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:775, in Connection._read_query_result(self, unbuffered) 773 else: 774 result = MySQLResult(self) --> 775 result.read() 776 self._result = result 777 if result.server_status is not None:

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:1156, in MySQLResult.read(self) 1154 def read(self): 1155 try: -> 1156 first_packet = self.connection._read_packet() 1158 if first_packet.is_ok_packet(): 1159 self._read_ok_packet(first_packet)

File /opt/conda/lib/python3.9/site-packages/pymysql/connections.py:725, in Connection._read_packet(self, packet_type) 723 if self._result is not None and self._result.unbuffered_active is True: 724 self._result.unbuffered_active = False --> 725 packet.raise_for_error() 726 return packet

File /opt/conda/lib/python3.9/site-packages/pymysql/protocol.py:221, in MysqlPacket.raise_for_error(self) 219 if DEBUG: 220 print("errno =", errno) --> 221 err.raise_mysql_exception(self._data)

File /opt/conda/lib/python3.9/site-packages/pymysql/err.py:143, in raise_mysql_exception(data) 141 if errorclass is None: 142 errorclass = InternalError if errno < 1000 else OperationalError --> 143 raise errorclass(errno, errval)

OperationalError: (3730, "Cannot drop table 'm1' referenced by a foreign key constraint 'mpart_ibfk_2' on table 'mpart'.")

### Expected Behavior
Notice that datajoint was trying to drop `M1` before dropping `M.Part`. The expected behavior should be dropping `M.Part`, `M`, and then `M1`.

### Additional Research and Context
I dig into the code a little bit and realized datajoint would first collect all descendants of the table to drop, sort them by topological order, reorder the sorted list to unite master and part tables (`unite_master_parts`, which should not disrupt the topological sort), and then drop the tables in topological order.
However, `unite_master_parts` does not actually preserve topological order.
In the above example, the topologically sorted list of tables happens to be:

['zhuokun_test.a', 'zhuokun_test.m', 'zhuokun_test.m1', 'zhuokun_test.m_part']

and `unite_master_parts` would reorder the list to be:

['zhuokun_test.a', 'zhuokun_test.m', 'zhuokun_test.m_part' 'zhuokun_test.m1',]


which is not topologically sorted and would cause issues when dropping tables.
This issue is probably also related to the behavior mentioned in https://github.com/datajoint/datajoint-python/issues/927#issuecomment-854385872
dimitri-yatsenko commented 1 year ago

Great catch!

dimitri-yatsenko commented 1 year ago

Thank you for reproducing the problem so well. Yes, the current unite_master_part is based on a faulty assumption that failed in this case. The algorithm meant to bring parts closer to their masters for transaction processing without breaking topological sorting. Fixing...