MIT-LCP / mimic-omop

Mapping the MIMIC-III database to the OMOP schema
MIT License
128 stars 48 forks source link

ERROR: duplicate key value violates unique constraint "xpk_measurement" #49

Closed smondet closed 6 years ago

smondet commented 6 years ago

After trying the fix for #48, I got:

...
INSERT 0 11656610
Time: 1585618.783 ms
INSERT 0 630856
Time: 81014.760 ms
psql:etl/StandardizedClinicalDataTables/MEASUREMENT/etl.sql:804: ERROR:  duplicate key value violates unique constraint "xpk_measurement"
DETAIL:  Key (measurement_id)=(301930857) already exists.
Time: 5491441.549 ms
parisni commented 6 years ago

@smondet thanks for reporting those. But remember, you should not add the indexes before loading the tables. The ETL might be far longer in this way. I suggest you to remove them. (updating indexes way more complex than building them from scratch)

spfohl commented 6 years ago

Do you mean that we should not build the indexes with "psql "$OMOP" -f "omop/build-omop/postgresql/OMOP CDM postgresql indexes.txt"" until after running the ETL with "psql "$MIMIC" -f "etl/etl.sql""? I'm just trying to clarify because the order implied by the README suggests that we should build the indexes first.

alistairewj commented 6 years ago

That was probably written by me - and probably it is a lot slower than it could be by doing it in my suggested order!

parisni commented 6 years ago

Yes exactly: indexes, integrity constraints shall be added after the ETL step.

Let me modify the REAME accordingly.

parisni commented 6 years ago

well @alistairewj looks ready to update his great documentation

spfohl commented 6 years ago

Awesome, that's good to know. I have also been having similar issues with duplicate keys, but I'll try re-running with the indexes at the end.

alistairewj commented 6 years ago

Yeah no problem - I think I do mention it about the constraints but I added it about the indexes. The constraints are clearly useful though - wouldn't have found these bugs without them :)