ga4gh / gks-clinvar

A submission tool for submitting to the ClinVar Submission API using GA4GH GKS data
Apache License 2.0
0 stars 0 forks source link

VarCat Data Dump and Submission through the ClinVar submission API #5

Open wesleygoar opened 2 months ago

wesleygoar commented 2 months ago

@korikuzma we need to get a dump of the VarCat curation data that I have been working on to then test the ClinVar Submission tool. This will allow me to check the data on the test server and see what feedback, if any, we need to provide to ClinVar.

korikuzma commented 2 months ago

@wesleygoar I will only focus on the assertions that are "Awaiting Review" (n=21)

korikuzma commented 2 months ago

@wesleygoar How did you want the onco evidence code data to be structured in the comment? See this comment on how the VCI team structures this.

korikuzma commented 1 month ago

@wesleygoar How did you want the onco evidence code data to be structured in the comment? See this comment on how the VCI team structures this.

comment = f"{significance_statement} The total score for this classification is {score} and the evidence supporting this classification are as follows. {statement} (ClinGen/CGC/VICC 2022: {evidence_code})"

Is what we will use

korikuzma commented 1 month ago

@wesleygoar Did you want to include localID and localKey as described here. If so, did you want to structure it as localID = varcat.variant:{id} and localKey = varcat.overall_assessment:{id} or something else?

wesleygoar commented 1 month ago

@wesleygoar Did you want to include localID and localKey as described here. If so, did you want to structure it as localID = varcat.variant:{id} and localKey = varcat.overall_assessment:{id} or something else?

@korikuzma this sounds good to me.

korikuzma commented 1 month ago

@wesleygoar How did you want the onco evidence code data to be structured in the comment? See this comment on how the VCI team structures this.

comment = f"{significance_statement} The total score for this classification is {score} and the evidence supporting this classification are as follows. {statement} (ClinGen/CGC/VICC 2022: {evidence_code})"

Is what we will use

I'm realizing this doesn't make sense for places where a user manually overrides the classification. For example, in 22-29642211-C-T/Schwannoma we'd have the following statement: "This variant is classified oncogenic in Schwannoma. The total score for this classification is 3".

Without being able to see the assessment, I was confused on why it was classified as oncogenic but only had a score of 3. It wasn't until I saw the following information that stated you manually changed the classification score:

  "description": "This is a null variant in a TSG. The score should be 11 which is an oncogenic classification. ",
  "rationale_description": "OncoKB cancer genes modal is broken but this should have a classification of oncogenic. ",

@wesleygoar how should we capture this information in the comment? It is not always guaranteed that a user will provide text like you where it is clear that the classification was manually changed.

wesleygoar commented 1 month ago

@korikuzma yeah, I really don't like this either. We should update VarCat to give the users the ability to manually add and override the computed score.

wesleygoar commented 1 month ago

@korikuzma although, thinking about this more, once the manual onco evidence PR is in, there really shouldn't be a reason that someone will need to manually override the overall oncogenicity. For now, we should avoid submitting these data that I had to perform manual override on the overall classification.