NASA-IMPACT / admg-backend

Apache License 2.0
2 stars 0 forks source link

Restore JSONField columns on DOI model. #523

Closed edkeeble closed 1 year ago

edkeeble commented 1 year ago

As documented in https://github.com/nasa-IMPACT/admg-casei/issues/538, the CMR fields on DOI models no longer store valid JSON, so they cannot be deserialized during the frontend build. The reason for this goes back to #459, where we converted the CMR fields from JSONFields to TextFields, due to a serialization bug in the MI which was causing the JSONField data to be serialized and escaped each time it was saved, resulting in excessively large field values over time (escape characters being escaped, etc).

This was not a bug in JSONField, but rather in the way we serialize data prior to saving a Change object when users edit Drafts in the MI. Rather than using the form's cleaned_data, which correctly deserializes the JSON string submitted by the user, we were storing the string directly. To illustrate:

What we want to store: {"a": "b"} What we were storing: '{"a": "b"}'

This is all valid, because update is a JSONField and you can store a string as a value, of course:

{
  "cmr_projects": "{\"a\": \"b\"}"
}

However, when the user later wants to edit this data, they access the ChangeUpdateView, which loads the Change model data via the ChangeModelFormMixin. The mixin generates a ModelForm for the target model (e.g. DOI), which is confusingly named DOIForm, even though we have a separately defined DOIForm, which has nothing to do with this process and good luck figuring that out in your debugger, but I digress...

The generated ModelForm correctly makes use of Django's JSONField to bind the data from the cmr fields to the form as a string. BUT when presented with a string instead of a deserialized object, what do you suppose JSONField does with it? Does it parse it the string first and then serialize it? Does it pass it along untouched, assuming it's already serialized? Nope. It json.dumps it again. And what happens if we JSON dump a serialized JSON object?

>>> s = '{"a": "b"}'
>>> json.dumps(s)
'"{\\"a\\": \\"b\\"}"'

Ooof. So, we could use a custom JSONField in our form to handle cases where we get JSON strings rather than JSON objects, but it's probably better to just store the correct data in the DB to begin with, which is what this PR does.


Update: Issues below are resolved

Initial investigation suggests the form fields in the MI are being treated as strings (understandably--they're text fields), which JSONField (on Django v4.1.5) processes directly with json.dumps: https://github.com/django/django/blob/eba81c83906fd966204b9853aaa793f0516ccd4b/django/db/models/fields/json.py#L93

>>> s = '{"a": "b"}'
>>> json.dumps(s)
'"{\\"a\\": \\"b\\"}"'

It looks like this may be fixed in Django 4.2.2, which first checks if the provided value is a string and attempts to json.loads it first, if so. https://github.com/django/django/blob/6218ed34541bfc1e49c0f169c5b3a216a63cd985/django/db/models/fields/json.py#L101

Effectively:

>>> s = '{"a": "b"}'
>>> json.dumps(json.loads(s))
'{"a": "b"}'

Update:

Django 4.2.2 doesn't resolve the issue on its own. The model does seem to be storing the correct, unescaped string, but when the value is presented in the update form, it is escaped. Looking into crispy forms and django form fields to see if that's where the issue occurs.