Closed d0choa closed 2 years ago
Hi @d0choa, I believe this is due to the gradual replacement of Orphanet terms with Mondo, all of these duplicates will eventually be an obsoleted Orphanet term with a replaced by link to the Mondo term. I will try to prioritise the removal of some of these in time for the July (18th) release.
The Orphanet terms have now been obsoleted and replaced with Mondo terms which should now fix this duplication after the July release - please let me know if it persists.
I have checked the latest release (3.44.0) and we no longer have Orphanet/MONDO duplication. Thanks @zoependlington!
However, there are still 49 examples with an identical name after converting them to lowercase. Some of them, like arterial occlusion
might be coming from the disease vs. phenotype conundrum.
3 of these have already been fixed in #1698
Many others remain genuine duplications (e.g. polycistic kidney disease
)
I will add mappings for the following:
http://purl.obolibrary.org/obo/MONDO_0021184 http://www.ebi.ac.uk/efo/EFO_1001303 deltaretrovirus infections deltaretrovirus infections
http://purl.obolibrary.org/obo/MONDO_0011014 http://www.ebi.ac.uk/efo/EFO_0009052 Pleuropulmonary blastoma Pleuropulmonary blastoma
http://purl.obolibrary.org/obo/MONDO_0700092 http://www.ebi.ac.uk/efo/EFO_0010642 neurodevelopmental disorder neurodevelopmental disorder
http://purl.obolibrary.org/obo/MONDO_0012368 http://www.ebi.ac.uk/efo/EFO_1001981 aminoacylase 1 deficiency aminoacylase 1 deficiency
http://purl.obolibrary.org/obo/MONDO_0020642 http://www.ebi.ac.uk/efo/EFO_0008620 Polycystic Kidney Disease Polycystic Kidney Disease
http://purl.obolibrary.org/obo/MONDO_0019165 http://www.ebi.ac.uk/efo/EFO_0009029 Central precocious puberty Central precocious puberty
http://purl.obolibrary.org/obo/MONDO_0014776 http://www.ebi.ac.uk/efo/EFO_0009059 Spinocerebellar ataxia type 42 Spinocerebellar ataxia type 42
The rest have either been taken care of (the measurement terms pointed out above) or are phenotype vs disease.
These mappings have now been added so the only duplicates should now be between disease/phenotype terms. Please let me know if that isn't the case.
At least in 3.42 and 3.43, there are a large number of duplicated terms in EFO mostly affecting rare diseases.
Just by lower-casing the names and looking for exact matches, there are 3036 duplicated terms (v3.42). Some of them are explained by disease vs phenotype conondrum, but the vast majority correspond to a MONDO vs Orphanet duplication.
Some examples:
Hemophilia Orphanet:448 - hemophilia MONDO:0018660 Fragile X syndrome Orphanet:908 - fragile X syndrome MONDO:0010383 Apert syndrome Orphanet:87 - apert syndrome MONDO:0007041
...