clingen-data-model / VCI-transformation

Scripts for transforming VCI JSON-LD into DMWG Interpretation JSON-LD
1 stars 1 forks source link

Evaluations which are not-met should result in EvidenceLines with evidenceStrength na #4

Open bpow opened 7 years ago

bpow commented 7 years ago

VCI evaluations are transformed to an EvidenceLine containing a CriterionAssessment. If the evalutation's criteriaStatus is not-met, then the transformed EvidenceLine should have evidenceStrength of CG-evidence-strength:na

cbizon commented 7 years ago

OK, sounds right. What is it doing now? I assume it's just using the defaultStrength?

larrybabb commented 7 years ago

I did not create the evidenceStrength attributes for "non-met" evidence lines in the examples in our spreadsheet. My recollection was that we were considering (at one point) whether non-met evidence lines would need an explicit value as @bpow suggests above.

One of those classic "should something be required, even when it is not applicable, questions".

I think the primary issue is that the VCI-interpretation python script is putting in the default evidence strength codes for "non-met" criterion outcomes.

Can @bpow clarify if he wants both the VCI-interp script to put "na" in for the non-met evidence strengths as well as requiring that "evidenceStrength" exists in our examples?

bpow commented 7 years ago

I think it is an issue with the VCI-transformation, not with the examples in the datamodel repo.

For example, looking at evaluation /evaluations/e3072a80-0495-4000-b332-ea3527cdbacf/ (in vci1.json and dmwg1.json, the VCI model has (some extraneous details removed for brevity:

{
            "@id": "/evaluations/e3072a80-0495-4000-b332-ea3527cdbacf/",
            "evidence_type": "Population",
            "criteriaStatus": "not-met",
            "@type": [
                "evaluation",
                "item"
            ],
            "criteria": "BA1",
            "submitted_by": {}
            "population": {}
            "variant": "/variants/f82df8c3-6369-4ed4-9f17-34b4d67275f0/",
            "uuid": "e3072a80-0495-4000-b332-ea3527cdbacf",
            "status": "in progress",
            "modifier": ""
        }

The transformation gives:

{
    "@context": "http://datamodel.clinicalgenome.org/interpretation/json/context",
    "clinicalSignificance": {
        "code": "LA6668-3",
        "display": "Pathogenic",
        "id": "http://loinc.org/LA6668-3",
        "system": "http://loinc.org/",
        "type": "Coding"
    },
   "condition": []
   "contribution": []
       "evidence": [
        {
            "evidenceStrength": {
                "coding": [
                    {
                        "code": "ba",
                        "display": "Benign Stand Alone",
                        "id": "http://clinicalgenome.org/datamodel/criterion-evidence-strength/ba",
                        "system": "http://clinicalgenome.org/datamodel/criterion-evidence-strength/",
                        "type": "Coding"
                    }
                ],
                "type": "CodeableConcept"
            },
            "information": [
                {
                    "contribution": [],
                    "criterion": {
                        "defaultStrength": {
                            "coding": [
                                "http://clinicalgenome.org/datamodel/criterion-evidence-strength/ba"
                            ],
                            "type": "CodeableConcept"
                        },
                        "description": "Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium",
                        "id": "BA1",
                        "shortDescription": "Present in population databases at greater than 5%",
                        "type": "Criterion"
                    },
                                        "id": "/evaluations/e3072a80-0495-4000-b332-ea3527cdbacf/",
                    "outcome": {
                        "code": "not-met",
                        "display": "Not Met",
                        "id": "http://clinicalgenome.org/datamodel/criterion-assertion-outcome/not-met",
                        "system": "http://clinicalgenome.org/datamodel/criterion-assertion-outcome/",
                        "type": "Coding"
                    },
                    "type": "CriterionAssessment",
                    "variant": "http://reg.genome.network/allele/CA010360"
                }
            }
        }
    }
}

I looks like the 'ba' is being passed up to the EvidenceLine's evidenceStrength even though the criterion was not met. Does that make more sense?

larrybabb commented 7 years ago

That is definitely a bug/problem. But, to be clear, if we want to generate json output like above from our examples, I will still need to explicitly add the "na" evidence strength records in the proper sheet for the criterionassessments that are not me.

Just let me know and I will do it.

bpow commented 7 years ago

Hmm... I may have just happened to look at the only example where this was done (EvLn110).

I think our convention is that if there is no evidenceStrength on the EvidenceLine for a criterion, then we assume its default strength... so the evidenceStrength can be used as an "override" for expert modification of strength in that case. I can think of two possibilities:

  1. We could use the logic that the evidenceStrength, if absent is the default strength for that criterion if the criterion is met, or the "n/a" if the criterion is not met.
  2. We could say that the evidenceStrength defaults to the criterion regardless of outcome, such that "n/a" is an override just like increasing/decreasing strength would be.
  3. We could require that all EvidenceLines that wrap Criteria have an evidenceStrength defined (even if it is the usual strength for that criterion)

I think these ways are all justifiable, but we should specify. The last option would be more verbose, but might make the clinvar submitter processing easier.

But... deciding among these options is really for the interpretation model repo, and the question here is how to deal with the evidenceStrength being set to ba in this case when the Criterion underneath was not met.

larrybabb commented 7 years ago

I think we should be explicit and pick #3.

We should "require" evidenceStrength on every EvidenceLine to be crystal clear, and it should always reflect the final strength. So for not-met it should be n/a. For same as defaultStrength it should be the defaultStrength value.

Since our strength value set includes directionality (Ben v Path) as well as weight (supporting, moderate, strong, very strong, stand alone) we would need to convey that if a message was produced or consumed with a "defaultStrength" in an opposite direction from the "evidenceStrength" or if the "evidenceStrength" was something other than "n/a" when the "outcome" was not-met, they should consider it invalid (or at the least be aware).

I am currently, filtering out all "non-met" evidence lines before loading the PMIDs and explanation set of selected ACMG rule strengths into the clinVar submission record. If one of these non-met guys had a legit strength, the I assume it should be ignored, but then maybe I should warn?

I think we will ultimately need a validator or schema tool to define the base requirements for structure and content around these messages. I just don't want to raise the bar too high for adopters.