Closed colleenXu closed 5 months ago
@colleenXu can you provide an example please?
I've shown the missing data issue using this notebook to compare the partially-processed data to what's in mydisease right now.
The code that causes some of the data to be missed (keeping annotations from only one mapped ID) is probably the if-elif-else here
However, as shown in the last section of my notebook, the solution isn't as simple as merging the records, because each disease-phenotype annotation has different references, evidence type, biocuration, frequency, etc. included with it.
@everaldorodrigo @andrewsu @newgene
This would be a useful issue to address, but I don't know if it's in-scope for Everaldo to work on
assigned to @DylanWelzel to confirm with @colleenXu if this is still an issue.
Update: confirmed with Dylan that this is an issue last Friday (2/16). Dylan is working on a fix, and we discussed it more on 2/22.
@DylanWelzel asked me to review and close this issue.
Based on on our convos and a quick check of the deployed API, I think this has been successfully addressed. Here's an example: Temtamy syndrome in MyDisease
hpo.omim
and hpo.orphanet
fields)phenotype_related_to_disease
now contains phenotypes from both the OMIM and orphanet data
I also see the extra adjustments:
clinical_course
/ inheritance
/ clinical_modifier
) have been adjusted to provide all info, similar to the pheno fields. So I think this is good and I'm closing the issue.
This was noticed in the HPO parser (and it's unclear if this is an issue with other parsers). The HPO parser has disease-phenotype data, with diseases having OMIM, orphanet, or decipher IDs.
HPO annotations do not appear to have ID resolution, so the "same" disease can have different annotations to their OMIM id compared to their orphanet ID or their decipher ID.
However, when the mondo ID resolving step is done, only 1 ID's annotations are kept (the priority list is omim first, orphanet, decipher). This means the other data is lost/missing from the API's output.