airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

Odd behavior using LinkML json_dumper #65

Open bcorrie opened 1 week ago

bcorrie commented 1 week ago

I have found an odd behaviour in using json_dumper. I do the following, which was modelled after what @jamesaoverton did in his code to dump out the LinkML of an object...

from linkml_runtime.dumpers import yaml_dumper, json_dumper, tsv_dumper
[code deleted]
                                akc_object[value['akc_field']] = value['value']
                                print(akc_object[value['akc_field']])
                                print(json_dumper.dumps(akc_object))

This generates the following in the depths of the conversion:

{'label': None, 'id': None}
{
  "akc_id": "f46bd0bf-e571-4be7-be89-7dcfae968b6c",
  "adc_link_tag": "ImmuneExposure_HD_018",
  "adc_repertoire_id": "671a7a712820f084c2be9b81",
  "adc_sample_processing_id": "671a7a712820f084c2be9b81",
  "adc_data_processing_id": "671a7a712820f084c2be9b81",
  "adc_study_id": "IR-T1D-000003",
  "adc_subject_id": "HD_018",
  "adc_sample_id": "HD_018_GAD",
  "@type": "ImmuneExposure"
}

So I essentially set a field in the AKC LinkML object akc_object. It is a dictionary with two fields, each of which has a null value. When I used json_dumper to dump the LinkML python class it seems to be not generating the field if it has null values. This is certainly not what I want.

For a ImmuneExposure that has a disease (label and id not null), it works as expected:

{'label': 'type 1 diabetes mellitus', 'id': 'DOID:9744'}
{
  "akc_id": "25f4bec2-a91d-4721-94bc-c55004eb1167",
  "disease": {
    "label": "type 1 diabetes mellitus",
    "id": "DOID:9744"
  },
  "adc_link_tag": "ImmuneExposure_T1D_024",
  "adc_repertoire_id": "671a7a722820f084c2be9b85",
  "adc_sample_processing_id": "671a7a722820f084c2be9b85",
  "adc_data_processing_id": "671a7a722820f084c2be9b85",
  "adc_study_id": "IR-T1D-000003",
  "adc_subject_id": "T1D_024",
  "adc_sample_id": "T1D_024_GAD",
  "@type": "ImmuneExposure"
}

@jamesaoverton @schristley any insights? The LinkML json_dumper seems to be choosing which fields to dump.

bcorrie commented 1 week ago

To be clear this is what I would expect in the null case:

{'label': None, 'id': None}
{
  "akc_id": "f46bd0bf-e571-4be7-be89-7dcfae968b6c",
  "disease": {
    "label": null,
    "id": null
  },
  "adc_link_tag": "ImmuneExposure_HD_018",
  "adc_repertoire_id": "671a7a712820f084c2be9b81",
  "adc_sample_processing_id": "671a7a712820f084c2be9b81",
  "adc_data_processing_id": "671a7a712820f084c2be9b81",
  "adc_study_id": "IR-T1D-000003",
  "adc_subject_id": "HD_018",
  "adc_sample_id": "HD_018_GAD",
  "@type": "ImmuneExposure"
}
bcorrie commented 1 week ago

I also notice that the LinkML documentation doesn't mention json_dumper, it mentions JSONDumper.

https://linkml.io/linkml/developers/loaders-and-dumpers.html#linkml_runtime.dumpers.JSONDumper

bcorrie commented 1 week ago

Hmm, I found this code: https://linkml.io/linkml/_modules/linkml_runtime/dumpers/json_dumper.html

And it has code that says:

            if isinstance(o, BaseModel):
                return remove_empty_items(o.dict(), hide_protected_keys=True)

But why??? And what do you do if you want to have an empty item? Does this mean in LinkML you can't have a field with a null value?

jamesaoverton commented 1 week ago

I don't know anything about this issue specifically, but you could ask about it on the #linkml channel of the OBO Community Slack, or on their GitHub issue tracker.

schristley commented 1 week ago

I suppose this is slightly better than before which was assigning blank strings '' to fields which didn't have a value. I saw a PR dealing with null but it wasn't exactly clear to me what was added. However, I updated our LinkML not too long ago to bring in those changes. I think it is primarily to support parsing/validating fields with null values.

I don't think LinkML currently supports an attribute like nullable: true, so we might not be able to support AIRR standards exactly in output generation. Also the nullable: true is an openapi3 attribute versus JSON schema, which may complicate matters.

I feel the AIRR standards practice of requiring the field even with a null value (e.g. "label": null) was a bit "non-standard" for the JSON world. Python is also a pain because you need to use the get function to avoid errors with missing keys while I feel JavaScript is much cleaner in that aspect in that it returns undefined/null if the key is missing.