CDLUC3 / dmphub

Simple metadata repository for networked DMPs
MIT License
3 stars 1 forks source link

Nasty bug in API #54

Open briri opened 3 years ago

briri commented 3 years ago

We encountered a strange issue that was causing a DMP to pick up an old orphaned identifier. It was attaching the identifier as descriptors 1 and 13 (is_identifier_for and is_metadata_for), basically one for each legit identifier. So for example if we had

1 - https://doi.org/10.1234/abcd 13 - https://dmptool.org/plans/1234/export?format=pdf

then it was magically adding 1 - https://magic.orphaned.rec/99999 13 - https://magic.orphaned.rec/99999

The odd part is that an inspection of ALL identifiers associated with the DMP and its children (e.g. datasets, contributors) at the point of saving had no reference to these ids.

The magic identifiers appear though when inspecting the reloaded DMP after the save: reloading the record

The subsequent call to contextualize errors correctly identifies the magic orphaned records as duplicates.

There must be logic somewhere that is picking up these orphans and doing a .first or .last. Perhaps through an all query. I seem to recall a colleague mentioning once that sometimes scopes will unexpectedly return ALL records if there is an issue.

The DB had about 70 orphaned identifiers with the 'is_related' descriptor (id - 30). I corrected this issue and once the orphans were removed the problem resolved.

We should A) do some investigation into how they because orphaned and B) find the culprit and correct in the event that we see orphans again.

briri commented 3 years ago

This has a lot to do with the sharing of logic between create and update.

The dmp.datasets.clear method for example actually deletes the entries from the DB ... this is inherent within the way Rails ActiveRecord works. We instead need to loop through the original and compare to the deserialized to determine if it should be deleted.

This is doing weird things like deleting the original dataset (but NOT its dependencies ... creating some orphans) and then recreating the dataset along with all of its dependencies. :/

We need to continue using the PersistenceService for create, but need something else for updates that get passed the deserialized JSON and the original object.

mariapraetzellis commented 2 years ago

This was temporarily addressed thanks to work on the new Related Identifiers field. Some documentation on this problem/fix is here: https://docs.google.com/document/d/1WsNTRdNfWcq5_HhBBE_5UriTQHItag-LeiiOe_v0G6A/edit

We will need to revisit this issue as part of versioning.