ga4gh-beacon / beacon-v2

Unified repository for the GA4GH Beacon v2 API standard
Creative Commons Zero v1.0 Universal
27 stars 22 forks source link

complexValue required type fixed #89

Closed costero-e closed 1 year ago

costero-e commented 1 year ago

This is the fix for the issue opened by @redmitry here #87 . Please @mbaudis and @redmitry review it to move this forward. Thank you.

costero-e commented 1 year ago

Thank you @redmitry to spot that! I added another commit to fix the position of the required. Can you review it again and confirm it's correct now? Thanks.

redmitry commented 1 year ago

The last piece is to return back:

            "required": [
                "quantityType",
                "quantity"
            ]

😄

costero-e commented 1 year ago

ok, done @redmitry , I am starting to feel the boomerang effect with this PR jaja, can you review it again? thank you!

redmitry commented 1 year ago

It looks correct now. There is also an error in the the original CINECA Individuals data.

"measurementValue" : {                                                                                                                                                                                          
    "quantity" : {                                                                                                                                                                                                  
        "unit" : {                                                                                                                                                                                                      
            "id" : "NCIT:C49671",                                                                                                                                                                                        
            "label" : "Kilogram per Square Meter"                                                                                                                                                                     
        },                                                                                                                                                                                                           
        "value" : 26.63838307                                                                                                                                                                                     
    }
}

while correct one must be like:

"measurementValue": {
    "unit": {
        "id": "NCIT:C49671",
        "label": "Kilogram per Square Meter"
    },
    "value": 26.63838307
}

I have a fixed version: https://gitlab.bsc.es/inb/ga4gh/beacon-v2-docker-demo/-/tree/master/beacon-data

mrueda commented 1 year ago

Just a note, if we want to be Phenopacket friendly, I suggest changing our schema to align with theirs. When in trouble, I follow theirs...

See: https://phenopacket-schema.readthedocs.io/en/latest/value.html

and:

https://github.com/EGA-archive/beacon2-ri-tools/issues/3

Cheers,

m

costero-e commented 1 year ago

Ok, let's close this PR and discuss the individuals schema question in issues. @mbaudis can you please review and accept this PR? Thanks.

mbaudis commented 1 year ago

@mrueda

Just a note, if we want to be Phenopacket friendly, I suggest changing our schema to align with theirs. When in trouble, I follow theirs...

Tru dat. But PXF is inconsistent itself, at least when looking at the examples (and hampered by the "let's write this in YAML starting from a protobuf definition" documentation...):

Value:

value:
    quantity:
        unit:
            id: "UO:0000316"
            label: "cells per microliter"
        value: 24000.0
        referenceRange:
            unit:
                id: "UO:0000316"
                label: "cells per microliter"
            low: 150000.0
            high: 450000.0

But example in Measurement:

measurement:
    assay:
        id: "LOINC:26515-7"
        label: "Platelets [#/volume] in Blood"
    value:
        quantity:
            unit:
                id: "UO:0000316"
                label: "cells per microliter"
            value: 24000.0
    referenceRange:
        unit:
            id: "UO:0000316"
            label: "cells per microliter"
        low: 150000.0
        high: 450000.0

... places the referenceRange wrong.

costero-e commented 1 year ago

we need @redmitry approval here to close this PR

costero-e commented 1 year ago

yes @redmitry , I still have to take a look carefully at individuals issue before validating it.

costero-e commented 1 year ago

@redmitry I just checked the CINETICA individuals schema. Correct me if I'm wrong but I think the correction you made is because the value is not complex? Also, even if the value was complex, quantity should be an array not an object? @mrueda said that phenopackets (https://phenopacket-schema.readthedocs.io/en/latest/covid19-example.html) are written like this:

Screenshot 2023-06-19 at 16 49 33

Here I see the example value is not complex and written with quantity. I can correct the individuals schema but wouldn't be good for beacon to respect the phenopackets schema? I'm not saying I'm not going to do it, I just want to understand know why we are not following the same logic as for phenopackets schema. Thank you @redmitry !

redmitry commented 1 year ago

can correct the individuals schema but wouldn't be good for beacon to respect the phenopackets schema? IMHO phenopackets is not a schema, but a format that can be represented in json.

Beacon's schema != phenopackets. We may create an entry type "phenopackets" and provide a schema for it.

Correct me if I'm wrong but I think the correction you made is because the value is not complex? The correction to the CINECA data was done because it didn't comply with beacon v2 schema.

Cheers,

D.

mrueda commented 1 year ago

In my view, the difference between Beacon v2 and Phenopackets v2 schemas, specifically in the individuals entry type, likely stems from the reverse engineering of Phenopackets due to its Protobuff schema. If a JSON schema for Phenopackets v2 was available, this discrepancy wouldn't exist. As a developer working with BFF/PXF, these minor differences in defining identical terms/properties are impractical, especially since Beacon v2 was derived from Phenopackets and the goal is future interoperability.

The ideal scenario would be Phenopackets folks serializing their schema to JSON (if anybody from Phenopackets is reading this, please listen to us!! :smile:). However, if this isn't feasible, I propose forming a small working group of two/three people (@mbaudis) to identify and gradually address these differences. I would be eager to participate, given the impact on my work.

Cheers,

m

costero-e commented 1 year ago

Ok, then at the moment we will stick to specifications and give the CINECA dataset the schema suggested by @redmitry for measurementValues until we find a solution for the adaptability to Phenopackets.

mrueda commented 1 year ago

@costero-e, IMO the key consideration here is whether the exclusion of the hierarchy 'quantity' from the Phenopacket schema was intentional or simply due to the challenges associated with reverse engineering the Protobuf schema. It should be noted that as of 2021, there were not many Phenopacket schemas available for 'Value'.

https://phenopacket-schema.readthedocs.io/en/master/value.html

If the downstream of the object 'Value' in Beacon v2 is identical to that in Phenopackets, it would be more reasonable to maintain consistency upstream as well. Instead of attempting to modify all replicas of CINECA's 'individuals.json', which is likely one of the few, if not the only, real examples that people have of Beacon v2 Models 😅 , it would be preferable to align the schema upstream and move on. What do you think @mbaudis?

Cheers,

Manu

redmitry commented 1 year ago

Ok, then at the moment we will stick to specifications and give the CINECA dataset the schema suggested by @redmitry for measurementValues until we find a solution for the adaptability to Phenopackets.

The problem is that "phenopackets-schema" is not a JSON schema. Phenopackets lib provides json serialization/deserialization, but not with defined (via schema) format.

The question is if it is feasible to develop "phenopackets-like" json schema, because in this case phenopackets would need to provide serializer/deserializer for it.

The choices are:

  1. Leave it as is - Beacon uses it's own "phenopacket-like" format.
  2. Beacon use phenopackes serialization format and would need to craft JSON schema that is in agreement with phenpackets format.
  3. Develop complete JSON schema for phenopackets (incompatible with current phenopackets serialization). 3.1 use traditional JSON schema pattern with oneOf() 3.2 use VRS pattern where the type is explicitly set (e.g. "type": "Age") for each object

Cheers,

Dmitry

costero-e commented 1 year ago

Hi @mrueda and @redmitry. I agree with both of you and as we need to find a solution for CINECA individuals schema respecting both specifications and future phenopackets interoperability I would propose two solutions (if I am wrong or is not feasible what I propose, correct me, please):

  1. Leave CINECA schema like it is and make specifications more flexible. Maybe we could specify that measurementValues can adopt both solutions so it also accepts the "phenopackets-like format" for a quantity measurement.
  2. Attach to @redmitry's solution but adding quantity elsewhere in the schema (and also in the specifications), that the value refers to a quantitative measurement. Maybe like adding a "measurementType" or something like that. Have nice afternoon!