cellannotation / cell-annotation-schema

General, open-standard schema for cell annotations
9 stars 1 forks source link

Problem with representing key-value pairs in the linkml schema #123

Open hkir-dev opened 1 month ago

hkir-dev commented 1 month ago

Fields typed as object without any properties or $ref declarations are causing issues when converting CAS to LinkML using schema-automator.

A recent example is the author_annotation_fields, which is intended to hold a dictionary/map as its value:

"author_annotation_fields": {
            "type": "object",
            "description": "A set of author defined key value pairs annotating the cell set. The names and aims of these fields MUST not clash with annotationThis schema accepts author defined fields."
          }

Schema-automator is raising the error Cannot translate type object in author_annotation_fields

Consequently, the value of author_annotation_fields is being represented as a JSON object string.

To resolve this, we can define this field explicitly as follows:

"Author_annotation": {
      "properties": {
            "annotation_name": {
                "type": "string",
                "description": "Name of the author annotation"
              },
              "annotation_value": {
                "type": "string",
                "description": "Value of the author annotation"
              }
      }
      "required": [
        "annotation_name",
        "annotation_value"
      ]
}
dosumis commented 4 days ago

I think you're probably correct on the solution as

(a) This will support output of user annotations in the graph, rather than storing JSON blobs. (b) It will allow us to include extra info about user fields.

OTOH:

If we do go down this route, then in parallel with implementing, we will need to improve reporting. I suggest flattening completely in the annotation table report with the method allowing for author categories to be included or excluded.