SoftwareIntrospectionLab / MininGit

Repository for UC Santa Cruz's work on Libresoft's CVSAnalY
http://tools.libresoft.es/cvsanaly
GNU General Public License v2.0
15 stars 17 forks source link

UNIQUE constraint on Hunks not always working #112

Open apepper opened 13 years ago

apepper commented 13 years ago

There is an UNIQUE constraint on table 'hunks' to make sure, that one hunk is only inserted once:

UNIQUE (file_id, commit_id, old_start_line, old_end_line, new_start_line, new_end_line)

Sadly mysql ignores NULL-Values, when it comes to UNIQUE constraints (see http://bugs.mysql.com/bug.php?id=8173 ). With this the following is possible:

insert (3, 4, 1, 1, 1, 4) -> row gets inserted
insert (3, 4, 1, 1, 1, 4) -> Duplicate Exception (CORRECT)
insert (1, 2, NULL, NULL, 1, 5) -> row gets inserted
insert (1, 2, NULL, NULL, 1, 5) -> row gets inserted (WRONG!!)

So when NULL values are present (which happens quite often with hunks) the duplicate protection fails. One workaround could be to use another default value e.g. "-1", "-999999" or other instead of NULL. But with this, every query, that works with hunks has always to check, if values is "-1", which probably clutters the code.

Another way could be to check manually check before an insert, that the value is not present. This would slow down the Hunks extension, because every insert needs another select query. On the other hand: later on, no extra logic is required.

What do you think about this? Currently for me this is not that bad, because my analysis are quite static, so I'm okey with running extension Hunks just once. But as soon as Hunks will be ran more then once, it becomes a problem.

cflewis commented 13 years ago

Oh, yuck. Just when I thought I had see all the MySQL weirdness, there's always something else (this is why I prefer PostgreSQL).

Ideologically, I'm opposed to using -1 just to work around MySQL being dumb, as it's really a NULL value use case. The manual check is a pain, but possible.

I'm going to put this one in the queue, but if it's not a current problem for yourself or anyone else (I too, am only doing one pass mining), I'm going to mark it Wishlist.