Open edkeeble opened 1 year ago
the
cmr_data_formats
column contains inconsistently formatted data. In some cases, it stores an array of strings and in others it stores a single string.
Do you feel like you have a solid understanding of how we are getting ourselves into these types of situations? Shouldn't we also add some logic into the application to ensure that we don't find ourselves back in this same situation in the future?
Yeah, that's a good point. I haven't updated the DOI Matching process to alter the data we insert into the DB. At a glance, the DOI Matcher still serializes all the data. I'll check if that matters in this context or if that's to be expected, since the DOI Matcher creates Change objects and stores everything in the update
field.
The only references I see to cmr_data_formats
seems to suggest that they are always stored as an array/list, so I'm not sure how we ended up with plain strings in the DB. We could check the CMR API and see if the issue is at the source.
My recommendation is to restore the staging DB from a dump of the production DB prior to deploying this and manually update that record, since it appears to be an anomaly:
d = DOI.objects.get(uuid='6a8e809b-c522-4029-a6b0-e28348a6bcf1') d.cmr_dates '[{\\"RangeDateTimes\\": [{\\"EndingDateTime\\": \\"1998-09-20T17:46:43.000Z\\", \\"BeginningDateTime\\": \\"1998-08-04T16:43:12.000Z\\"}], \\"EndsAtPresentFlag\\": false}]' d.cmr_dates='[{\\"RangeDateTimes\\": [{\\"EndingDateTime\\": \\"1998-09-20T17:46:43.000Z\\", \\"BeginningDateTime\\": \\"1998-08-04T16:43:12.000Z\\"}], \\"EndsAtPresentFlag\\": False}]' d.save()
This confuses me a bit. What is this encoding? false
seems like a fine JSON boolean, but then all the keys are double-escaped. Any idea of how we got here?
Any idea of how we got here?
The double escaping is the result of the issue around the model form passing a string to json.dumps (https://github.com/NASA-IMPACT/admg-backend/issues/389). I made a mistake in thinking I had to correct that false
value, though. What I actually need to do is remove the escape characters prior to attempting to load the json string in the migration. That allows this value to convert properly.
Also reconvert data after switching columns back to JSONField.
This PR addresses a couple of issues with the previous attempt to store CMR data as JSON:
I've run this on a dump of the production DB and it works correctly, with the exception of one record, which has an unparseable "false" value. My recommendation is to restore the staging DB from a dump of the production DB prior to deploying this and manually update that record, since it appears to be an anomaly:
Also of note: the
cmr_data_formats
column contains inconsistently formatted data. In some cases, it stores an array of strings and in others it stores a single string. The updated migration will force a single string to be stored as an array, which I think makes sense given the name of the column.