OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
130 stars 169 forks source link

Vocab source_daimon not used in Atlas cohorts #206

Closed alondhe closed 6 years ago

alondhe commented 7 years ago

Hi,

I was testing a new deployment of a source in which the CDM source daimon and the results source daimon are in the same database and in the same schema. The vocab source daimon sits in a different database but still accessible as this is all taking place in PDW.

However, it looks like in Atlas, when generating cohorts, the creation of a codeset temp table will try to reference the CDM source daimon to grab the vocabulary tables (e.g. concept), rather than using the vocab source daimon.

It should be using the vocab source daimon instead of assuming the vocabulary tables are in the CDM source daimon.

Example: CDM source daimon is CDM_DEATH_SUBSET.truven_ccae; Vocab source daimon is VOCABULARY_20161218.dbo, which yields this error: Invalid object name 'CDM_DEATH_SUBSET.truven_ccae.CONCEPT

Thanks, Ajit

rtmill commented 7 years ago

Can confirm I am also experiencing this issue. During cohort generation it tries to access (CDM_LOCATION).dbo.CONCEPT.

As a temporary fix I loaded the vocabulary into the same database as the CDM. It would be convenient to be able to have the vocabulary in a different database to avoid data duplication if you have multiple CDM instances.

fdefalco commented 7 years ago

Well, technically the vocabulary tables are supposed to be part of any CDM and used in any processing on that specific CDM, like generating cohorts. We do this because the embedded vocabulary is the one that was used when the CDM was created and we want that vocabulary used when interacting with the CDM. The vocabulary daimon is a setting within the WebAPI acknowledging that the database can support vocabulary related requests, but is not used in cohort generation.

rtmill commented 7 years ago

It seems odd that the vocabulary daimon can be used as a source qualifier (when defining the cohort, etc) and then only as a flag during cohort generation.

If differing vocabulary versions are the biggest concern, could we simply do a check on load that the vocabulary version recorded during the ETL matches the value in the referenced vocabulary table? In other words, if cdm_source.vocabulary_version = vocabulary_version from vocabulary where vocabulary_id = 'None'

chrisknoll commented 7 years ago

The reason for a vocabulary daimon as a separate thing from the cdm daimon is that we wanted to support the case where only the vocabulary was installed on a 'source' and you want to be able to perform vocabulary queries/actions via webapi. It wasn't designed to specify a different vocablulary for cohort definitions (and I'm not sure where you can do what you described @rtmill: the vocabulary can be used as a source qualifier when defining a cohort?).

For those cases where the environment has 1 vocabulary installed that is shared across several CDMs, I recommend creating views in the CDM schema that point to all the vocabulary tables in the shared schema. This way, you don't have to copy over the vocabulary tables, and you get a 'fully compliant' cdm schema defined for each of your cdm instances.

-Chris

alondhe commented 7 years ago

@chrisknoll D'oh, I forgot about views! When building the death_subsets proof-of-concept mentioned in the first post, I ended up copying all the vocabs into the cdmDatabaseSchemas.

I think that's a suitable solution, but I don't want to close the issue until @rtmill provides his thoughts on using a set of table views from a shared vocab.

rtmill commented 7 years ago

@chrisknoll I apologize I wasn't being clear. If you assign the vocabulary daimon to be in a separate database from the CDM, the concept search and cohort definition tool will reference the vocabulary db instead of the CDM db. It only breaks (references CDM.concept) when you generate the cohort.

In other words, if your source_daimon looks like:

source_daimon_id source_id daimon_type table_qualifier priority
1 1 0 CDM.dbo 0
2 1 1 VOCABULARY.dbo 1

Then searching for concepts will access VOCABULARY.dbo.concept, while cohort generation will attempt to access CDM.dbo.concept. Was this intentional?

@alondhe Using views seems like a reasonable workaround

chrisknoll commented 7 years ago

@rtmill: when you say "searching for concepts" you mean using the vocabulary search in Atlas, yes, it will use the highest priority daimon_type = 1. This was intentional but I can understand the confusion. You'd have to go into the atlas config to pick the specific source to use, but @fdefalco would have to confirm if the vocabulary services always looks for a daimon_type = 1 when doing the table_qualifier or if a higher priority CDM would take precedence. Personally, in light of this discussion, I think i'd avoid putting both a Vocabulary daimon and a CDM daimon on the same source so that you don't run into this ambiguity with witch vocabulary is being referenced on a given source. But if the vocabulary service is always looking for a daimon_type=1 to determine the table_qualifier, then you'd have to set it up this way. I'd only use a vocabulary daimon in the case where the source only has the vocabulary tables and no CDM tables.

Cohort definitions should always use the cdm-local vocabulary tables precisely for the reasons @fdefalco gave: the ETL used the local CDM vocabulary tables (presumably) so when identifying the target population using vocabulary concepts, it's a best practice to refer to the same vocabulary that was used to ETL the cdm.

-Chris

rtmill commented 7 years ago

@chrisknoll I'll follow suit and keep the vocabulary within the same database as the CDM. For debugging purposes, I believe the cause of the ambiguity is not just the priority value but that the cohort generation code assumes you want to use the CDM daimon while the vocabulary search checks for both

pavgra commented 6 years ago

Resolved by https://github.com/OHDSI/WebAPI/commit/694957d1780284492b7463fd5905f58e9d7ce0c8