datajoint / datajoint-matlab

Relational data pipelines for the science lab
MIT License
42 stars 38 forks source link

incorrect cascading deletes in cases of non-primary foreign keys #4

Closed dimitri-yatsenko closed 11 years ago

dimitri-yatsenko commented 11 years ago

Currently, DataJoint implements its own cascading deletes instead of relying on MySQL's native cascading deletes.

This solution

dj.BaseRelvar/del generates the lists of all dependent tables and deletes from them, starting from the bottom. Each delete is restricted by the top relation. This rule works well when all foreign key fields are also primary key fields in the tables involved but can result in deleting non-dependent tuples if one of the tables along the cascade includes non-primary fields in its foreign key.

This issue may be fixed by simply not cascading down from any table that does not include all its foreign key fields in their primary key.

aecker commented 11 years ago

Can you give an example? I don't understand how something can have a foreign key constraint but be non-dependent.

Does your solution imply that under certain conditions the delete will not cascade properly and one will have to issue multiple deletes starting from different tables?

dimitri-yatsenko commented 11 years ago

Here is an example with three tables x.A, x.B, and x.C.

%{
x.A (manual)   # test dummy
a  : int  #  A's id
%}

%{
x.B (manual)  #  test dummy
b  : int  #  B's id
----
-> x.A
%}

%{
x.C (manual)  # test dummy
-> x.B
%}

Currently, del(x.A & 'a=1') will first recurse down the hierarchy to delete x.C & (x.A & 'a=1') and x.B & (x.A & 'a=1'). Since x.A and x.C do not share any fields in common, deleting x.C & (x.A & 'a=1') will delete everything in x.C.

The problem arises because x.B makes a non-primary reference to x.A. New users rarely use non-primary references, so the problem hasn't come up despite being there all along.

Proposed fix

I will fix this by making the delete explicitly refuse to cascade across non-primary references. I just need to come up with a very clear message, something like "Table x.B has dependent tuples. Please delete them first before deleting from x.A".

An alternative solution is to rely on MySQLs cascading delete functionality. Unfortunately, it does not seem to provide a way to list the deleted contents before committing. It also has problems resolving multiple dependence paths. So for now we are forced to use our own cascading mechanisms.

aecker commented 11 years ago

Can we not use the correct restriction for cascading in such a case? Wouldn't x.C & (x.B & (x.A & 'a=1')) work? In other words, every time we encounter a non-PK reference, we use that table for any restrictions down the hierarchy. Potential caveat: restrictions can become quite complex if the hierarchy is deep. However, the complexity should depend on the number of non-PK references, which are usually small. Am I missing something?

dimitri-yatsenko commented 11 years ago

Yes, that solution would work correctly but in many cases could be prohibitively slow. We do provide non-cascading delete dj.Relvar/delQuick, which could be used in cases where the overhead associated with cascading deletes is excessive.

So we have three alternatives:

  1. A solution that relies on MySQL's cascade functionality. Fast and correct but has issues. Requires upgrading existing schema to use ON DELETE CASCADE.
  2. Refuse to cascade across non-PK references. Prompts the user to explicitly delete from the dependent table.
  3. Cascade across non-PK reference by adjusting the restriction. Correct but potentially slow.

Does option 3 make the most sense?

aecker commented 11 years ago

Yes I think so. We can issue a warning that this may be slow but I would argue it's the logical solution since it doesn't remove any functionality.

Also, think about what you would do if DJ told you it can't cascade the delete. You'd manually issue the exact query option 3 would do, in which case there would be no speed benefit for option 2 anyway.

I'm pretty sure option 1 isn't going to work in pretty much any of our schemas.

dimitri-yatsenko commented 11 years ago

okay, implementing solution 3.

dimitri-yatsenko commented 11 years ago

Fixed.