OHDSI / Aphrodite

[in development]
Apache License 2.0
37 stars 15 forks source link

Time for extracting data - APHRODITE #12

Closed SSMK-wq closed 4 years ago

SSMK-wq commented 4 years ago

Hello @jmbanda ,

I was reading the APHRODITE manual and going through the code files. I see that we have two parameters called nCases and nControls.

Though I configure them as 200, I see that getPatientData function is running for more than an hour and no output yet. Moreover, it's only me who is accessing the db and no one has access to it.

In addition our patient's count in db is 5K.

Is this an expected behavior?

jmbanda commented 4 years ago

If your data is not properly indexed, clustered, etc. this will take a very very long time. Also, this depends on what kind of machine you are running this. Do you actually see the query running in your DBMS?

SSMK-wq commented 4 years ago

Hi @jmbanda ,

Yes, my tables are indexed and system has enough memory as shown below. Not sure why it's taking a long time to fetch data for just 200 patients. I understand it depends on system config, but can you let me know how long did it take for you and your system config please?

condition_occurrence | idx_condition_concept_id | CREATE INDEX idx_condition_concept_id ON   cdm.condition_occurrence USING btree (condition_concept_id)
-- | -- | --
condition_occurrence | idx_condition_person_id | CREATE INDEX idx_condition_person_id ON cdm.condition_occurrence USING   btree (person_id)
condition_occurrence | idx_condition_visit_id | CREATE INDEX idx_condition_visit_id ON cdm.condition_occurrence USING   btree (visit_occurrence_id)
condition_occurrence | xpk_condition_occurrence | CREATE UNIQUE INDEX xpk_condition_occurrence ON cdm.condition_occurrence   USING btree (condition_occurrence_id)
drug_exposure | idx_drug_concept_id | CREATE INDEX idx_drug_concept_id ON cdm.drug_exposure USING btree   (drug_concept_id)
drug_exposure | idx_drug_person_id | CREATE INDEX idx_drug_person_id ON cdm.drug_exposure USING btree   (person_id)
drug_exposure | idx_drug_visit_id | CREATE INDEX idx_drug_visit_id ON cdm.drug_exposure USING btree   (visit_occurrence_id)
drug_exposure | xpk_drug_exposure | CREATE UNIQUE INDEX xpk_drug_exposure ON cdm.drug_exposure USING btree   (drug_exposure_id)
ip_lab_visits | ip_lab_visits_person_id_idx | CREATE INDEX ip_lab_visits_person_id_idx ON cdm.ip_lab_visits USING btree   (person_id)
ip_lab_visits | ip_lab_visits_vis_occ_id_idx | CREATE INDEX ip_lab_visits_vis_occ_id_idx ON cdm.ip_lab_visits USING   btree (visit_occurrence_id)
measurement | idx_measurement_concept_id | CREATE INDEX idx_measurement_concept_id ON cdm.measurement USING btree   (measurement_concept_id)
measurement | idx_measurement_person_id | CREATE INDEX idx_measurement_person_id ON cdm.measurement USING btree   (person_id)
measurement | idx_measurement_visit_id | CREATE INDEX idx_measurement_visit_id ON cdm.measurement USING btree   (visit_occurrence_id)
measurement | xpk_measurement | CREATE UNIQUE INDEX xpk_measurement ON cdm.measurement USING btree   (measurement_id)
observation | idx_observation_concept_id | CREATE INDEX idx_observation_concept_id ON cdm.observation USING btree   (observation_concept_id)
observation | idx_observation_person_id | CREATE INDEX idx_observation_person_id ON cdm.observation USING btree   (person_id)
observation | idx_observation_visit_id | CREATE INDEX idx_observation_visit_id ON cdm.observation USING btree   (visit_occurrence_id)
observation | xpk_observation | CREATE UNIQUE INDEX xpk_observation ON cdm.observation USING btree   (observation_id)
observation_period | idx_observation_period_id | CREATE INDEX idx_observation_period_id ON cdm.observation_period USING   btree (person_id)
op_visits | op_visits_person_id_idx | CREATE INDEX op_visits_person_id_idx ON cdm.op_visits USING btree   (person_id)
patient_visits | patient_visits_person_id_idx | CREATE INDEX patient_visits_person_id_idx ON cdm.patient_visits USING   btree (person_id)
person | idx_person_id | CREATE UNIQUE INDEX idx_person_id ON cdm.person USING btree (person_id)
person | xpk_person | CREATE UNIQUE INDEX xpk_person ON cdm.person USING btree (person_id)
person_age | person_age_id_idx | CREATE UNIQUE INDEX person_age_id_idx ON cdm.person_age USING btree   (person_id)
visit_occurrence | idx_visit_concept_id | CREATE INDEX idx_visit_concept_id ON cdm.visit_occurrence USING btree   (visit_concept_id)
visit_occurrence | idx_visit_occurrence_id | CREATE INDEX idx_visit_occurrence_id ON cdm.visit_occurrence USING btree   (visit_occurrence_id)
visit_occurrence | idx_visit_person_id | CREATE INDEX idx_visit_person_id ON cdm.visit_occurrence USING btree   (person_id)

And this is my system memory usage

total          used        free        shared   buff/cache   available

131945672     4989520   104741280     9955008    22214872   116133208

SSMK-wq commented 4 years ago

Hello @jmbanda ,

May I kindly check with you on whether you had a chance to look at this issue and the other issues? Would really be helpful to seek your inputs and experiment with APHRODITE at my end

jmbanda commented 4 years ago

Are all the other constraints also created? Is the function actually running when it is not returning anything?

SSMK-wq commented 4 years ago

Yes, all other constraints are in place. I guess you are asking about FK constraints etc. CDM table definition are followed as is.

Yes, the function is running and no output. My total number of unique subjects is 5K

jmbanda commented 4 years ago

Then it some sort of issue on your system. I would try with 10 patients first to see how long it takes to continue.