CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Core2 Library JSON Inconsistencies (Object vs Array vs Value) #617

Open terrywbrady opened 3 years ago

terrywbrady commented 3 years ago

The Core2 approach for generating JSON can cause inconsistent return types for specific key values. The library is an XML to JSON conversion without schema awareness. When only one value is returned, it is returned as a value. When 2 or more values are returned, the return object becomes an array.

If Merritt publishes a public-facing API, any JSON that is published should consistently publish objects, arrays or values.

I recommend that we re-visit this when we start our API work. In the past I have used GSON. https://github.com/google/gson

I have a question about the Core2 library formatter for JSON
Please take a look at the following two responses.  Notice that in the SINGLE entry response the Array bracket is missing, but it appears when there are multiple jobs entries
The bracket is in the line "fil:batchFile": [
Is this a function of the JSON library you are using?  Terry asked if it could be consistent and I do not know if it is a lot of work.
We can discuss this later in the week, as I know you have a lot with Audit currently.
Thanks
------ MULTIPLE -----
{
  "fil:batchFileState": {
    "xmlns:fil": "http://uc3.cdlib.org/ontology/mrt/store/file",
    "fil:jobFile": {
      "fil:batchFile": [
        {
          "fil:file": "jid-222221-000000"
        },
        {
          "fil:file": "jid-000000-000000"
        },
        {
          "fil:file": "jid-111111-000000"
        }
      ]
    }
  }
}
------ SINGLE -----
{
  "fil:batchFileState": {
    "xmlns:fil": "http://uc3.cdlib.org/ontology/mrt/store/file",
    "fil:jobFile": {
      "fil:batchFile": {
        "fil:file": "jid-222222-000000"
      }
    },
    "fil:batchManifest": ""
  }
}

Interesting. Unfortunately, the conversion is a single call for conversion from xml to json. This is why the xmlns namespace tag is included.  I'm thinking the problem is that xml doesn't have anything to indicate that a sub-element can be reoccurring unless a schema is involved.  Using this approach not sure if there is a solution, even going with a new jar or different converter.
terrywbrady commented 3 years ago

Mark and I discovered another issue. JSON clients expect text containing newlines to be serialized as "\n" in place of newline characters. Currently, newlines are being treated as spaces.

terrywbrady commented 11 months ago

This issue recently created some challenges for @mreyescdl , so we should discuss this at the next team meeting.

JSON can be validated against a schema. We make use of this in the Merritt Object Health prototype. A JSON schema can be written in either JSON or YAML. The YAML is easier to read. The YAML may need to be converted to JSON when using the schema.

Here are a couple sample schemas.

For more information, see https://json-schema.org/

terrywbrady commented 11 months ago

Example workaround for this issue. https://github.com/CDLUC3/mrt-ingest/blob/main/ingest-it/src/test/java/org/cdlib/mrt/ingest/ServiceDriverIT.java#L143-L185