coursera / dataduct

DataPipeline for humans.
Other
252 stars 82 forks source link

fix duplicate primary check to not return 1 row per duplicate #150

Closed cliu587 closed 9 years ago

cliu587 commented 9 years ago

Currently the duplicate primary key check returns 1 row per duplicated primary key. For large tables, this can make the script go OOM and throw a non-helpful 'Script returned with exit status 137' rerror.

This change makes the primary key check script only return a single row containing the count of total duplicate primary keys.

sb2nov commented 9 years ago

Needs to be part of larger migration