EBIvariation / CMAT

ClinVar Mapping and Annotation Toolkit
Apache License 2.0
18 stars 10 forks source link

Manual curation for 2024.03 release #410

Closed apriltuesday closed 8 months ago

apriltuesday commented 8 months ago

Refer to documentation for full description of steps.

Checklist:

apriltuesday commented 8 months ago

Hello @tcezard @M-casado, here is the curation spreadsheet for this round. Some notes on this one:

tcezard commented 8 months ago

Ready for review. 112 DONE 227 IMPORT 1 NEW 4 UNSURE

Couple of extra points:

  1. A few lines have been duplicated which mean the total count is actually less than this
  2. I focused on the OBSOLETE terms so there aren't that many new curations.
  3. The UNSURE are terms where the Clinvar label point to a group of multiple specific terms (grouped together) but not quite the more generic terms For example in "stickler syndrome, dominant" the "dominant" aspects englobes 3 possible stickler syndromes types (1,2 IIa6) but not the other types (4, 5). We could use the "stickler syndrome" term which would be correct but is not quite specific enough.
  4. I quickly followed a pattern in my curation and wanted to post the "algorithm"
    • Search for the term in EFO looking for a perfect or close match
    • Search for the term in MONDO looking for a perfect or close match
    • Search for the term in HP looking for a perfect or close match
    • Search for the term in MEDGEN looking for a synonym or a definition that I could use to search the first EFO/MONDO/HP

I think we could have these search precomputed before the spreadsheet is made. That would make the manual curation much faster.

tcezard commented 8 months ago

@apriltuesday pointed me to the documentation on multiple mappings so I change the UNSURE curations by duplicating the rows and add terms that cover different portion of the ClinVar description. In some case that create a mixture of IMPORT and DONE terms which might be confusing

apriltuesday commented 8 months ago

café-au-lait macules with pulmonary stenosis is confusing, but based on that Medgen page I changed it to Watson syndrome... let me know if you disagree.

For the others I'm not really sure whether we should recreate in EFO the intermediate grouping that Mondo obsoleted (so make these NEW terms), or map to the multiple terms as you've done... Mondo seems to have a lot of exclusion rules that I don't know apply to us (or EFO, or Open Targets...). But there might be something problematic about the intermediate grouping that I'm not seeing.... Maybe should wait for @M-casado's input.

M-casado commented 8 months ago

UNSURE ClinVar labels

We could use the "stickler syndrome" term which would be correct but is not quite specific enough

I think that, based on our rubric, we either: (1) import/create a new parent term that distinguishes the inheritance mode; (2) import/create all subtypes to map them extensively; (3) recur to mapping a subtype to a parental type. My experience tells me we ought to do the last one. This is my fear regarding mappings from parent types to subtypes: unless we are extensive on the duplicated mappings, we would be incurring in a wrong and skimmed association.

I also changed severe myoclonic epilepsy in infancy to UNSURE, given that the associated term (MONDO_0014960) to be IMPORTED was not related for what I could find. I added a comment regarding which EFO term we could map it to, although it's a parental type, and has a note regarding possible obsoletion.

tcezard commented 8 months ago

All done for resolving the UNSURE: Thank you @M-casado @apriltuesday I've made a copy of the spreadsheet that we can use during upcoming KT session without risking modifying the one we use for submission.

apriltuesday commented 8 months ago

Thanks all, export done and EFO issue created.