EBIvariation / CMAT

ClinVar Mapping and Annotation Toolkit
Apache License 2.0
19 stars 10 forks source link

Investigate use of all trait names vs. preferred trait names in OT evidence generation #384

Open apriltuesday opened 1 year ago

apriltuesday commented 1 year ago

Context of issue: When we do trait mapping (automated and manual), we use only preferred names, but when we annotate we attempt to use all names. Because we retain previous mappings even if they don't appear (i.e. don't appear among preferred names in current ClinVar), this means obsolete mappings can be not just retained but also used without being updated.

Example - in ClinVar:

    <TraitSet Type="Disease" ID="6307">
      <Trait ID="4675" Type="Disease">
        <Name>
          <ElementValue Type="Preferred">Malignant tumor of urinary bladder</ElementValue>
          <XRef ID="Bladder+cancer/7822" DB="Genetic Alliance"/>
          <XRef ID="399326009" DB="SNOMED CT"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary bladder cancer</ElementValue>
          <XRef ID="MONDO:0001187" DB="MONDO"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary Bladder Neoplasms</ElementValue>
          <XRef ID="D001749" DB="MeSH"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Bladder cancer</ElementValue>
        </Name>
        <AttributeSet>
          <Attribute Type="keyword">Hereditary cancer syndrome</Attribute>
        </AttributeSet>
        <XRef ID="MONDO:0001187" DB="MONDO"/>
        <XRef ID="C0005684" DB="MedGen"/>
        <XRef Type="MIM" ID="109800" DB="OMIM"/>
      </Trait>
    </TraitSet>

In latest mappings:

# preferred name yields up-to-date mapping
$ grep -i '^Malignant tumor of urinary bladder' latest_mappings.tsv
malignant tumor of urinary bladder  http://purl.obolibrary.org/obo/MONDO_0004986    urinary bladder carcinoma

# alternate name yields obsolete mapping
$ grep -i '^Urinary bladder cancer' latest_mappings.tsv
urinary bladder cancer  http://www.ebi.ac.uk/efo/EFO_0000292    bladder carcinoma

In #383 we modified annotated XML generation to use only preferred names, observing that it decreased coverage of traits only slightly while decreasing the number of obsolete EFO terms used significantly.

The goal of this issue is to see what is the impact of making a similar change for OT evidence string generation (which is more complicated due to how it groups and explodes traits), and if it is acceptable make the change.

apriltuesday commented 1 year ago

See also #210