OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
127 stars 166 forks source link

Cohort Definition List query joins to CohortDetails #219

Closed chrisknoll closed 5 years ago

chrisknoll commented 7 years ago

Expected behavior

Querying for cohort definition list should not include columns (especially large columns) that are not required for the cohort definition list view.

Actual behavior

When querying the cohort definition list, the JPQL generated causes a inner join to CohortDefinitionDetails and pulls in the VARCHAR(MAX) column. This has negative performance impacts, and has even causes queries to hang on our EC2 sql server instances.

Steps to reproduce behavior

  1. Open console log in WebAPI
  2. Open a browser and access /WebAPI/cohortdefinition
  3. you will see a Hibernate: debug message with the query. Notice that the cohort_definition_details table is joined, and the expression column is returned.
chrisknoll commented 5 years ago

Reopening issue due to regression. Getting the cohort definition list is joining to the details table, which results in excessive bandwidth against the database. The query reported when accessing /WebAPI/cohortdefinition:

select cohortdefi0_.id as id1_12_0_, cohortdefi1_.id as id1_13_1_, userentity2_.ID as ID1_49_2_, 
    userentity3_.ID as ID1_49_3_, cohortdefi0_.created_by_id as created_7_12_0_, cohortdefi0_.created_date as created_2_12_0_, 
    cohortdefi0_.modified_by_id as modified8_12_0_, cohortdefi0_.modified_date as modified3_12_0_, 
    cohortdefi0_.description as descript4_12_0_, cohortdefi0_.expression_type as expressi5_12_0_, 
    cohortdefi0_.name as name6_12_0_, cohortdefi1_.expression as expressi2_13_1_, 
    cohortdefi1_.hash_code as hash_cod3_13_1_, userentity2_.last_viewed_notifications_time as last_vie2_49_2_, 
    userentity2_.LOGIN as LOGIN3_49_2_, userentity2_.NAME as NAME4_49_2_, 
    userentity3_.last_viewed_notifications_time as last_vie2_49_3_, userentity3_.LOGIN as LOGIN3_49_3_, 
    userentity3_.NAME as NAME4_49_3_ 
from ohdsi.cohort_definition cohortdefi0_ 
inner join ohdsi.cohort_definition_details cohortdefi1_ on cohortdefi0_.id=cohortdefi1_.id 
left outer join ohdsi.SEC_USER userentity2_ on cohortdefi0_.created_by_id=userentity2_.ID 
left outer join ohdsi.SEC_USER userentity3_ on cohortdefi0_.modified_by_id=userentity3_.ID

Note the 'expression' column being returned and the join to ohdsi.cohort_definition_details.

Because this has regressed a few times, let me explain why we do not want to return the expression column:

The expression column contains the cohort expression which could be several megabytes of json text which is streamed out of the database to the WebAPI layer, but ultimately is not even used in the response. This causes large delays over our network (several thousand cohort definitions in the list would result in gigabytes of transfer). And, incase anyone is wondering, server side paging will not address this, the data should not be returned in the first place.