Closed ericaVoss closed 1 year ago
Right, those shouldn't exist. Thanks for noticing (although you have "normal" concepts there like concept.vocabulary_id that belongs to different CDM versions as represented in concept_code). For drug domain please feel free to chose RxNorm/RxE only. For duplicates inside these vocabs- min(concept_id). For other domains just pick a random one. Once it's fixed, you'll get either a valid concept or replacement mapping that you can follow.
Looks like we have a QA problem. Let's take a look.
And also there's a problem with PPI concepts: Despite their Answer-concepts are unique for every question, I mean "Are_you_smoking_Yes"and "Are_you_drinking_Yes" are different concepts, but they have the same name in the source. And if such answers don't exist in OMOP, we make all of them standard
Now it's 34658 duplicates. We fixed this GRR thing.
And PPI fix is upcoming.
I notice that we have duplicates in 'Geography' domain.
for example:
select * from concept where domain_id ='Geography' and concept_name ='Centro'
;
there are 308 Centro.
@Alexdavv , do they really have 308 cities/towns called Centro?
@Alexdavv, do they really have 308 cities/towns called Centro?
They are districts/suburbans of different cities having a different hierarchy and geographic location. And even the names of the cities/towns are regularly repeated. To exclude the duplicates, we implemented the logic considering both geometry and hierarchy. So there is no way to use the SQL mentioned above for searching Geo duplicates. We thought about the modification of concept names, but the current decision was considered to be an optimal one.
Can you show examples, @Alexdavv?
Can you show examples, @Alexdavv?
https://athena.ohdsi.org/search-terms/terms?standardConcept=Standard&page=1&pageSize=15&query=Centro
Currently, we have a QA check that makes sure that we don't introduce the concepts with the same names. In OSM, NAACCR, NCD vocabularies and some other places it's expected. If you find something relevant in the future, please report it.
For Vocabulary v5.0 18-JAN-19
I think I'm seeing duplicate standard codes with the same name.
107826 rows return.
Specific example:
Or is there some way for me to know which I choose?