ga4gh / va-spec

An information model for representing variant annotations.
Apache License 2.0
17 stars 4 forks source link

'Disease' Value Set recommendation for VA #65

Open mbrush opened 4 years ago

mbrush commented 4 years ago

For the Variant Annotation model, we want to provide a value set for Diseases to support standard capture of disease references in VA Statement types - where we need to capture the Disease that a variant is interpreted for (e.g. in Variant Pathogenicity interpretations and Therapeutic Response predictions)

We are of course looking at the MONDO disease ontology for this purpose, as it is most comprehensive and includes mappings to most other widely used Disease ontologies and terminologies. MONDO is also a recommendation of the GA4GH ClinPheno WG - but we want to confirm this with them, and ask about any limitations or considerations for use of MONDO as our Disease value set.

Looking to @mellybelly and other from CPDC to weigh in here if they have any thoughts or advice (I posted this issue here because I didn't see much activity on issues in the [ClinPheno repo[(https://github.com/ga4gh-cp) - but happy to post this elsewhere if advised).

One issue that was raised is that the cancer community currently uses NCIT for its disease terminology - and while MONDO does map to NCIT, there are concerns it may not preserve all the info/nuance in the native NCIT, and that there may be pushback from the cancer community about using MONDO here. (@ahwagner please clarify /extend my characterization here as needed)

ahwagner commented 4 years ago

Thanks @mbrush. I think you stated it well. The only clarification I would make is that the "concern" here is was more a set of questions (that I do not know the answer to):

  1. Regarding terminology: does MONDO contain the full NCI-T disease term set, i.e. do MONDO disease terms map many:1 or 1:1 to each NCI-T term? Presumably this is a moving target, as NCI-T is continually refined with input from the community. If ingesting NCI-T is automated, how frequently does MONDO capture it, and what branches are covered?
  2. Regarding semantics: Are the inter-disease relationships found in NCI-T preserved / completely concordant with those in MONDO? If not, what information is available to explain discrepancies?
  3. Regarding VA recommendation (@mbrush): is there a particular reason we want to limit our recommendation to one ontology? I would find it very agreeable to recommend MONDO plus NCI-T for cancer, but maybe there's a key use case / requirement that requires these terms to all fall under one namespace?

Again, these are real questions that I do not have the answers to, the answers to which would be relevant to a discussion about what we use for our standardized vocabulary.

Would definitely appreciate insight from @mellybelly, @cmungall or other active contributors to MONDO.

mellybelly commented 4 years ago

I think that you should allow more than one terminology here; NCIt is most appropriate for cancer, Mondo is more appropriate for genetic diseases (not that cancer isn't genetic) ;-). As for creating a value set that is the de-duped and equivalency-determined union of possible choices from both, I think that is a great use case for the CCDH work. @cmungall @wdduncan @balhoff

mbaudis commented 4 years ago

@mellybelly yes, +1! Always follow the prefix:class model, but never limit what could be used (just recommend in documentation, not in schema). 2022 won’t be 2019.

wdduncan commented 4 years ago

We might want to also consider making use of ICDO v3. There may be cancers in it that aren't in NCIT (and vice versa). I'm not sure of the coverage overlap between the two. I do know that not every ICO v3 set has not been mapped to an NCIT class.

Also, how relevant are lap values and pathology reports here? The LOINC mappings might need to be expanded, and do you already have coverage for pathology findings (e.g., the tumor is poorly differentiated and tested negative for HER2)?

Thanks, Bill

On Thu, May 14, 2020 at 4:19 PM Michael Baudis notifications@github.com wrote:

@mellybelly https://github.com/mellybelly yes, +1! Always follow the prefix:class model, but never limit what could be used (just recommend in documentation, not in schema). 2022 won’t be 2019.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/va-spec/issues/65#issuecomment-628865665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYJ7TQCWL6TFWSZEOMO3FLRRRG5RANCNFSM4MS73IKQ .

mellybelly commented 4 years ago

@balhoff @vasilevsky @mbaudis can you comment on the ICDO mapping with NCIt status. It was our hope to be able to fully support ICDO to NCIt mappings due to NCIt's more robust semantics, among other advantages. ICDO may also be more difficult to use as a value set.

mbaudis commented 4 years ago

@mellybelly @wdduncan We had mapped a subset of ICD-O 3 morphology+site combinations based on encountered diagnoses from Progenetix / arrayMap. The mappings are accessible through the ICDontologies project including API etc. However, they are incomplete & sometimes contentious (e.g. many more new codes now in NCIt compared to the set we started with ...); so @nicolevasilevsky had worked w/ us to correct and/or request missing terms from NCI, in Feb/Mar. Would love to pick this up again, with external help in content and format of representation - so please, push/help! Also, @paulacarrio signaled (limited) availability...

(And ICDO itself isn’t good for referencing - no open ontology service, only morphology+site together make sense - but then rather good.)

wdduncan commented 4 years ago

Thanks Michael!

Do you if there has been efforts to:

Also, it would be good to have mappings for ICO tumor grading to NCIT. E.g.: https://training.seer.cancer.gov/coding/guidelines/rule_g.html

NCIT has concepts for tumor grading; e.g.: https://ncit.nci.nih.gov/ncitbrowser/pages/concept_details.jsf?dictionary=NCI_Thesaurus&version=20.05a&code=C14167&ns=ncit&type=mapping&key=1379098292&b=1&n=0&vse=null

But, I could not (in my short browsing time) to find mapping between NCIT grading code and ICDO. Closest thing I could find was some mappings in UMLS metathesaurus: https://ncim.nci.nih.gov/ncimbrowser/ConceptReport.jsp?dictionary=NCI%20Metathesaurus&type=synonym&code=C0205617

Bill

On Sat, May 16, 2020 at 11:54 AM Michael Baudis notifications@github.com wrote:

@mellybelly https://github.com/mellybelly @wdduncan https://github.com/wdduncan We had mapped a subset of ICD-O 3 morphology+site combinations based on encountered diagnoses from Progenetix / arrayMap http://progenetix.org. The mappings are accessible through the ICDontologies https://github.com/progenetix/ICDOntologies project including API etc. However, they are incomplete & sometimes contentious (e.g. many more new codes now in NCIt compared to the set we started with ...); so @vasilevsky https://github.com/vasilevsky had worked w/ us to correct and/or request missing terms from NCI, in Feb/Mar. This was interrupted through a) COVID19 cutting my Berkeley stay short (sad), and b) my involved student becoming mother (happy).

Would love to pick this up again, with external help in content and format of representation - so please, push/help! Also, @paulacarrio https://github.com/paulacarrio signaled (limited) availability...

(And ICDO itself isn’t good for referencing - no open ontology service, only morphology+site together make sense - but then rather good.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/va-spec/issues/65#issuecomment-629666837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYJ7TSOSFTWQ3G3TSBFOALRR2ZMXANCNFSM4MS73IKQ .

mbaudis commented 4 years ago

The clinical/grading concept does not really translate to to the categorical concepts in ICD-O & NCIt. Sure, /3 means that it is invasive ... but mostly the information is a) not fine grained enough or b) part of the general disease classification (an “anaplastic ...” is poorly differentiated). But mostly grading is an addition to classification (like staging), until we’re at the “ontology class of one” state ;-)

mbrush commented 4 years ago

@ahwagner which is the most appropriate NCIt term to root the recommended value set? My best guess is Disease or Disorder' (http://purl.obolibrary.org/obo/NCIT_C2991) - but wanted to see if there are straggler terms in other NCIt hierarchies that might be relevant for inclusion?

vasilevsky commented 4 years ago

Hi guys, probably I am wrong @vasilevsky here :)

On Mon, Jun 1, 2020 at 11:57 PM Matthew Brush notifications@github.com wrote:

@ahwagner https://github.com/ahwagner which is the most appropriate NCIt term to root the recommended value set? My best guess is Disease or Disorder' (http://purl.obolibrary.org/obo/NCIT_C2991) - but wanted to see if there are straggler terms in other NCIt hierarchies that might be relevant for inclusion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/va-spec/issues/65#issuecomment-637138678, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAFFMYM2UE6AVH4N3OACDRUQP4NANCNFSM4MS73IKQ .

mbaudis commented 4 years ago

@vasilevsky thanks for the note - oh well, corrected to @nicolevasilevsky: Apologies to both of you!

mbaudis commented 4 years ago

@wdduncan Re. ICDO <=> UBERON; discussed on various occasions & would be very nice - found a source?

wdduncan commented 4 years ago

@mbaudis No, I haven't such a source, but I haven't been looking .... Such a source would be great! Any way to fund such an effort in this grant?

nicolevasilevsky commented 4 years ago

Ha! Hi @vasilevsky! Nice to meet another Vasilevsky in the world. :)

Here is the related ticket re: NCIt - ICDO mappings: https://github.com/monarch-initiative/mondo/issues/1148

nicolevasilevsky commented 4 years ago

Here are responses to the questions above:

Regarding terminology: does MONDO contain the full NCI-T disease term set, i.e. do MONDO disease terms map many:1 or 1:1 to each NCI-T term? Presumably this is a moving target, as NCI-T is continually refined with input from the community. If ingesting NCI-T is automated, how frequently does MONDO capture it, and what branches are covered?

Mondo does not contain the full NCIt disease term set, we mainly focus on the neoplasm branch. The mapping is always 1:1 for NCIT The ingest of NCIt is not automated, and it has not been ingested since the initial ingest when we created Mondo, but we intend to update the ingest and hope to create a more regular process/schedule

Regarding semantics: Are the inter-disease relationships found in NCI-T preserved / completely concordant with those in MONDO? If not, what information is available to explain discrepancies?

There are some cases in Mondo where we aren’t concordant with NCIt, there are tickets in GitHub, which are tagged with NCIt: https://github.com/monarch-initiative/mondo/issues?q=is%3Aopen+is%3Aissue+label%3Ancit

mbaudis commented 4 years ago

This is especially for @wdduncan : Thanks to work by @qingyao we have now a (limited) set of ICD-O Topo <-> UBERON mappings, for everybody's peruse ... https://github.com/baudisgroup/icdot2uberon Comments/feedback/additions are welcome!

wdduncan commented 4 years ago

Thanks @mbaudis !