ga4gh / gks-clinvar

A submission tool for submitting to the ClinVar Submission API using GA4GH GKS data
Apache License 2.0
0 stars 0 forks source link

Evaluate ClinVar Test API #6

Open korikuzma opened 2 months ago

korikuzma commented 2 months ago

GitHub repo: https://github.com/ncbi/clinvar/tree/master/submission_api_schema Test endpoint: https://submit.ncbi.nlm.nih.gov/apitest/v1/submissions Submission API documentation: https://www.ncbi.nlm.nih.gov/clinvar/docs/api_http/

@ahwagner said:

  • Focus should be on what submission / response looks like; keeping in mind how we will track submissions and reference them for status / revision requests.
  • We also want to check that the data appears in the test instance as expected following submission.
    • @wesleygoar to review first and once approved, we'll send along to @ahwagner for secondary review
  • If time allows, we should try sending malformed data to see what failed validation looks like.

Additional notes:

korikuzma commented 2 months ago

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533

For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

korikuzma commented 2 months ago

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533

For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

wesleygoar commented 2 months ago

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533 For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

That's odd that they didn't apply a consistent approach everywhere.

korikuzma commented 2 months ago

It also appears that the spreadsheets allow for more info to be submitted. For instance, in the spreadsheets there's a Variant - more section with information such as Variation identifiers, Alternate designations, and URL but I cannot seem to find this information in the apitest jsonschema.

jsstevenson commented 2 months ago

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

korikuzma commented 2 months ago

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

Ugh

korikuzma commented 2 months ago

I cannot seem to find this information in the apitest jsonschema.

Is this because "Apparently, the NCBI ClinVar server has no tight coupling to the JSON schemas and these schemas are mostly for informative purposes."?

Ugh

I was just hoping I was blind

ahwagner commented 2 months ago

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533 For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

We should compile a list of questions like this and send along to our ClinVar contacts for clarification.

korikuzma commented 2 months ago

@ahwagner sounds good. Wanted to run this by y'all before doing so at least

korikuzma commented 2 months ago

I don't think there's a way to submit our evidence / evidence line data (using ClinGen/CGC/VICC SOP onco codes) at the moment. I'm only seeing a citation field and it's pretty limited.

@ahwagner had said "The VCI folks have ended up encoding pathogenicity codes as free-text string comments"

korikuzma commented 2 months ago

By the way, I have not been testing the API directly yet. I was manually creating submission requests for a CIViC EID and VarCat assertion for test fixtures. I sent a request for a service account this morning. Once approved, we'll get our API key and we will be able to test the submission API.

ahwagner commented 2 months ago

I don't think there's a way to submit our evidence / evidence line data (using ClinGen/CGC/VICC SOP onco codes) at the moment. I'm only seeing a citation field and it's pretty limited.

@ahwagner had said "The VCI folks have ended up encoding pathogenicity codes as free-text string comments"

To be clear, the VCI folks flatten their data structure to accommodate the ClinVar model in this way; they natively encode a rich ev/prov structure similar to VarCat. Here's an example record from ClinGen/VCI in ClinVar: image

korikuzma commented 2 months ago

May be the name of a specific drug, e.g. ruxolitinib, or a drug class, e.g. JAK inhibitors. Multiple terms are allowed only to represent combination therapies, e.g. cisplatin and vinorelbine for non-small cell lung cancer. Separate multiple terms with a semi-colon."

https://github.com/ncbi/clinvar/blob/master/submission_api_schema/submission_apitest_schema.json#L530-L533 For drugForTherapeuticAssertion in clinicalImpactClassification: it looks like you can only represent a single therapy or combination therapy. In GKS we have a TherapeuticSubstituteGroup, which we use in CIViC.

On a related note, you are only required to provide the name of the drug. However, in other places such as providing disease/gene it allows you to provide database identifiers (array of items) OR name as a string. I'm curious why they didn't follow a similar approach here.

How the CIViC team handles this: "We just turn substitutes into separate AIDs"

korikuzma commented 1 month ago

If time allows, we should try sending malformed data to see what failed validation looks like.

As I have been performing dry runs on the test API, I've been testing out validation both on accident and on purpose. Here are some examples of output (I haven't been saving the malformed input data):

 'errors': [{'message': "'recordStatus' is a required property",
   'code': None,
   'identifier': None},
  {'message': "Unevaluated properties are not allowed ('description', 'direction', 'id', 'isReportedIn', 'predicate', 'qualifiers', 'specifiedBy', 'strength', 'therapeutic', 'tumorType', 'type', 'variant' were unexpected)",
   'code': None,
   'identifier': None}]}
{'message': 'Validation failed, see errors for detailed description', 'errors': [{'message': "5233 is not of type 'string'", 'code': None, 'identifier': None}]}
{'message': 'Validation failed, see errors for detailed description', 'errors': [{'message': "{'db': 'PubMed', 'id': '25265492'} is not of type 'array'", 'code': None, 'identifier': None}, {'message': "{'gene': [{'id': 2778}]} is not valid under any of the given schemas", 'code': None, 'identifier': None}, {'message': "'Tier I - strong' is not one of ['Tier I - Strong', 'Tier II - Potential', 'Tier III - Unknown', 'Tier IV - Benign/Likely benign']", 'code': None, 'identifier': None}]}
korikuzma commented 1 month ago

We also want to check that the data appears in the test instance as expected following submission. @wesleygoar to review first and once approved, we'll send along to @ahwagner for secondary review

@ahwagner since the test instance doesn't appear to let you see the data, I just added you as a reviewer for #8 and #9 (@wesleygoar approved both) to review the data we would submit. Once approved, I can submit to the test instance

korikuzma commented 1 month ago

8, #9, #11 will close this