Closed btbonval closed 10 years ago
Actually, it might be way easier to just toss on a unique constraint on slug
and usde_id
. Whatever crashes from the unique constraint clearly causes the problem, no need for guesswork.
For now, going to delete the MIT dupe by hand to proceed with #68. There are plenty of other dupes on the VM to toy with if desired:
karmaworld=> SELECT count(*) FROM courses_school;
count
-------
14427
(1 row)
karmaworld=> SELECT count(distinct(usde_id)) FROM courses_school;
count
-------
7484
(1 row)
I'm guessing the import schools thing got run twice or something and doesn't check for dupes. The VM is working with a newer set of schools from US DE, which is probably why it has 7484 unique schools while Beta has only 7001.
No more VM now, so can we close this?
There will be a new VM, but this ticket might not suffer the same problem. I'm concerned that the problem existed at all and what its implications have for even Heroku, but let's open a ticket when the problem expresses itself in a novel environment.
In order to input MIT notes from #68 , we need to find MIT.
On my VM, I found two instances of MIT which differed only by their internal id.
I did not find a repeat of MIT on Beta, nor any repeated school:
Turns out the difference was due to
No School
andHarvard Business School
both having NULLusde_id
, where NULL entries are left out of distinct entirely.I'm not sure how this happened. We want to ensure that the school import code will not import schools that already exist, and possibly we should have the sanitize search for dupes. The problem with dupes is if both are being used as a FK. If none are being used, it is arbitrary which is removed and if one is being used, that is the one to keep.