NASA-IMPACT / admg-casei

ADMG Inventory
https://impact.earthdata.nasa.gov/casei/
Apache License 2.0
1 stars 0 forks source link

WIP: catch errors if JSON parsing encounters JS Object #539

Open naomatheus opened 1 year ago

naomatheus commented 1 year ago

First pass at addressing:

538

See issue for details

naomatheus commented 1 year ago

checking that CI tests pass

edkeeble commented 1 year ago

Doesn't this make the keywords object unusable if the JSON parser fails? I feel like allowing the build to pass with invalid data is going to give us the appearance of a successful build, but the actual output will be unpredictable. If we can't get the parser to handle the provided data, then we should fix the problem in the backend and ensure we're always storing valid JSON in that field.

Tammo-Feldmann commented 1 year ago

I agree with @edkeeble, it seems that something has changed in the backend where we are now storing and serving non-valid json data for doi.keywords. I'm not sure we can or should try to resolve this in the frontend.

Tammo-Feldmann commented 1 year ago

Here is a list of the dois with non-valid JSON for the keyword strings:

Valid cmr keyword strings have escape characters and double quotes as in this example:

Screen Shot 2023-06-13 at 9 03 35 AM

Not valid examples look like:

"[{'Term': 'CLOUDS', 'Topic': 'ATMOSPHERE', 'Category': 'EARTH SCIENCE', 'VariableLevel1': 'CLOUD MICROPHYSICS', 'VariableLevel2': 'PARTICLE SIZE DISTRIBUTION'}, {'Term': 'CLOUDS', 'Topic': 'ATMOSPHERE', 'Category': 'EARTH SCIENCE'}]"

I believe the next step would be to figure out where in the backend we started generating these non-valid JSON strings for gcmd_keywords in the past week or so.

cc: @edkeeble

naomatheus commented 1 year ago

Yea that makes sense @Tammo-Feldmann . Preferable to just letting invalid JSON stay as well.

Doesn't this make the keywords object unusable if the JSON parser fails?

But @edkeeble my understanding was that the JSON parser fails when the object encountered has already been deserialized. Is that the case with this list of invalid JSON examples above?

naomatheus commented 1 year ago

we should fix the problem in the backend and ensure we're always storing valid JSON in that field.

Any know off of top where we're writing these objects to the DB? Will investigate

Tammo-Feldmann commented 1 year ago

Thank you for looking into it @naomatheus.

edkeeble commented 1 year ago

Yea that makes sense @Tammo-Feldmann . Preferable to just letting invalid JSON stay as well.

Doesn't this make the keywords object unusable if the JSON parser fails?

But @edkeeble my understanding was that the JSON parser fails when the object encountered has already been deserialized. Is that the case with this list of invalid JSON examples above?

JSON.parse is the function that handles deserializing the string and I don't believe we have any other logic to process these fields prior to that step.

edkeeble commented 1 year ago

I believe the next step would be to figure out where in the backend we started generating these non-valid JSON strings for gcmd_keywords in the past week or so.

We just deployed the ADMG backend to production for the first time in about a year 1.5 weeks ago, so the answer is probably somewhere in the 171 files changed in this PR... https://github.com/NASA-IMPACT/admg-backend/pull/414/files

edkeeble commented 1 year ago

From the looks of the bad strings, we're storing the string encoding of the dictionaries for each field instead of serializing them into JSON. For example:

>>> import json
>>> a = {"test": "something"}
>>> str(a)
"{'test': 'something'}"
>>> json.dumps(a)
'{"test": "something"}'