biocodellc / geome-db

MIT License
2 stars 0 forks source link

Need better error message for hash-based identifiers #34

Open jdeck88 opened 4 years ago

jdeck88 commented 4 years ago

When a spreadsheet is loaded and a hash-based ID is duplicated, the resulting message can be confusing to user. E.g. a spreadsheet for amphibian disease with a duplicate materialSampleID and duplicate diagnostic data will return with a message like:

"diagnosticID" column is defined as unique but some values used more than once: "d23e4f1868753a14c82ec1ddcd778ba7"

I guess we can keep the above message in this case but it would be better to provide some more information like:

"diagnosticID" column is defined as unique but some values used more than once: "d23e4f1868753a14c82ec1ddcd778ba7".  If this column is a hashed-based identifier,
check for duplicate rows.

Another, perhaps, better approach is on the unique value check, see if its a hashed-based identifier and if so, provide an alternate message, like:

Duplicate data in entity "Diagnostics" with the same parent ID.  See materialSampleID = 1603000533

bd-duplicaterecordexample.xlsx