SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Enrich Disease Xrefs with Mapping Properties #75

Open stuppie opened 6 years ago

stuppie commented 6 years ago

Disease cross-references from DO are encoded as hasDbXref, which are typically taken to mean these entities have a 1-to-1 relationship / exact match. This is often not true.

An example:

3-methylglutaconic aciduria type 3 has an xref to ICD10CM:E71.1 which is "Other disorders of branched-chain amino-acid metabolism". This is a catch-all category for certain metabolic disorders that aren't explicitly defined by other codes in ICD10. This is not an exact match; it is a narrower -> broader term relationship.

One source of disease data that captures this is Orphanet. 3-methylglutaconic aciduria type 3 has ICD-10:E71.1 (NTBT (narrower term maps to a broader term))

I'm not sure how we should interpret the legal disclaimer / if we can use these xref qualifiers in Wikidata (@andrewsu )

Note: This is not the best example because both DO and Orphanet have the ICD10 code wrong. There is a more specific ICD term: E71.111 which is "3-methylglutaconic aciduria". Which still should be qualified with NTBT/skos:broadMatch anyways

stuppie commented 6 years ago

From: http://www.orphadata.org/cgi-bin/docs/userguide.pdf

Characterization of the alignments between disorders and external terminologies or resources: OMIM, ICD10, MesH, UMLS, MedDRA and GARD (NIH-NCATS Genetic and Rare Diseases Center): these alignments are characterized in order to indicate if the terms are perfectly equivalent (exact mapping) or not. Possible values:

  • E (exact mapping - the terms and the concepts are equivalent)
  • NTBT (narrower term maps to a broader term)
  • BTNT (broader term maps to a narrower term)
  • NTBT/E (narrower term maps to a broader term because of an exact mapping with a synonym in the target terminology)
  • BTNT/E (broader term maps to a narrower term because of an exact mapping with a synonym in the target terminology)
  • ND (not yet decided/unable to decide)

Furthermore, for ICD10 a precision is added to indicate if a specific code exists in ICD10 for a disorder, or if it is listed in the tabular list or in the index in ICD10 without having a specific code, or if Orphanet has attributed the code together with the validation status for the attribution. Possible values:

  • Specific code (the term has its own code in the ICD10)
  • Inclusion term (the term is included under a ICD10 category and has not its own code)
  • Index term (the term is included in ICD10 index pointing to a code that is not specific for the term)
  • Attributed (the term does not appears at all in ICD10 and a code was attributed according to these rules)

See also: http://www.orpha.net/orphacom/cahiers/docs/GB/Orphanet_ICD10_coding_rules.pdf

stuppie commented 6 years ago

Wikidata now has mechanisms to qualify external ID mappings with a "mapping relation type". The agreed upon enumerated values for this property are discussed here and listed below.

SKOS: broadMatch narrowMatch relatedMatch closeMatch exactMatch

Example: image (link)

andrewsu commented 6 years ago

on the subject of licenses, given the license terms you linked, we should not use Orphanet in any mass import of xrefs. In my mind, I think this ticket is about establishing the patterns for indicating exact/broad/narrow match. From there, we can notify the editors who like to edit disease items and leave it up to them to (re)derive those mappings.