Closed apriltuesday closed 6 months ago
@Dona094 @tcezard The curation spreadsheet is now ready here.
There are a ton of terms like ttn-related condition
, these seem to be from a single large submission in March (example)... I'm not sure if there's anything we should do besides curate these as usual, but let me know if you have a suggestion.
I've performed the curation with limited success:
17 DONE
125 IMPORT
5 NEW
4 SKIP
2 UNSURE
I have some reservation about the curating the gene related condition:
I've looked in detail into the first of these condition (ttn-related condition) to get an idea of how to annotate these types of trait: ttn-related condition: mean that we're looking at any condition related to the TTN gene. Looking at the TTN gene on medline, it seems that the condition sassociated are all myopathies or dystrophies. TTN-related myopathy could be a good term to associated with but it only refers to myopathy although some of its children are distrophies as well. This term does not include all the conditions related to TTN in its children. For example: Medline mentions "Hereditary myopathy with early respiratory failure" that looks more like congenital myopathy 21 with early respiratory failure which is a sibling of TTN-related myopathy
The next one is even more complicated rai1-related condition: Medline indicates (https://medlineplus.gov/genetics/gene/rai1/ three conditions related to this gene
We could create new term that include the 3 conditions but again we can't be sure we captured all the potential conditions associated with that gene.
The process could also be somewhat automated:
Stepping back I'm also not convince of the value that these annotations will bring to Open Targets. The point of ClinVar is to associate variant with conditions/diseases. Here we only have an association between variants and a gene which we would have anyway using VEP. Finding or Creating the right ontology term will associate the variant with a very high level ontology term providing very little information to Open Targets
I suggest that we review this with Open Target and potentially with EFO.
@apriltuesday & @tcezard & @Dona094 - I have taken a look at these Gene-related labels, and I believe it all has to do with a poor term curation at source and the lack of a parent ontology term.
In essence, these labels are a combination of (e.g.) CFTR mutation carrier status
(EFO:0021794) and disease
(EFO:0000408). They simply tell us that there is a phenotype and a gene variant, but not the relationships between them nor what the disease is. And without this part, I think the value of these annotations drops dramatically.
We cannot, and should not, infer subtypes from parental terms. In terms of knowledge representation, @tcezard already mentioned it above, we cannot be sure if the modifier related
implies anything beyond the presence of the mutation. The problem here is that we are missing the sources' point of view when related
is used. Why would a disease be "related" to a gene, unless they know that the gene variation is causing the phenotype? How do they assess that without a very thorough examination and sequencing? The easiest route would be to see a "common consequence of a gene variant" (e.g. myopathy) along a mutation in that gene (e.g. Titin), but if you (the source) know the phenotype, it's no longer a plain "condition". I feel like along the curation of the data, these bits are lost somewhere, and not submitted to ClinVar. There is a convergence point that just groups everything as "condition" and "related", and dumps everything to CV.
The way I see it, we have two alternatives:
There is a third option, I am very against, which is to map parent terms to subtypes. I noticed this was the case with ttn-related condition
and cftr-related disorders
. We should be careful with creating patches that would lead to semantic errors in the future, and may pass inadvertently under future manual curation rounds, given that they were once accepted by a curator.
Below there are some terms that I took a look at specifically, but the trend is easy to find in all others.
ttn-related condition
TTN-related myopathy
(MONDO:0100175) too specific, and based on my comment above, would also be wrong.qualitative or quantitative defects of titin
(MONDO:0016191), even though it's directly above the myopathy, and has only the children class of the myopathy.
cftr-related disorders
...the most well-characterized" of the cftr-related disorders
, we are mapping a parent term to subtype, contrary to our criteria. I would advise to change that from DONE
to UNSURE
or SKIP
.pkd1-related condition
Interestingly, this is particular because the gene name is PKD1
, but the gene comes from polycystic kidney disease 1
, possibly hinting the disease. I wouldn't assume it has to be that disease alone, and thus I would advise again that "not doing something" is better than "doing something wrong".
Thank you @M-casado. What you're saying is concordant with our conclusions. Our current course of action is to
We might actively filter those traits out in the future and OT will investigate if these association can be used anywhere.
I've also changed cftr-related disorders to SKIP.
Thanks all, I'll email ClinVar about these, but for this round we'll ignore them.
Export done and EFO issue created.
Refer to documentation for full description of steps.
Checklist: