OHDSI / PheValuator

An R package for evaluating phenotype algorithms.
https://ohdsi.github.io/PheValuator/
17 stars 6 forks source link

Feature not extracted from all domains #28

Closed SSMK-wq closed 4 years ago

SSMK-wq commented 4 years ago

Hello,

Earlier when I ran PheValuator, I was able to see that only Demographic features were pulled. Based on discussion with @jswerdel , I recently populated my era tables.

But still I don't see any features from other domains like measurements, conditions, drugs etc.

May I kindly check with you on why does this happen? Because when I inspected the code of createChronicDefaultCovariates, I see that TRUE is set for features from different domains. I was playing around with different parameters in the function but still couldn't get the covariates from other domains. Am I missing something here? I feel this issue could be due to parameter values because there shouldn't be any reason why other domain features are skipped. Am I making any mistakes in configuring parameter values? Do I have to input any special parameter values based on my data characteristics listed below.

Data characteristics

a) Our dates are de-identified. Meaning the chronological order is maintained but shifted into future like 2200, 2400, 2600 etc. b) Not all our 5.2k patients have visit data. Only 4.7k patients have visit data. Similarly for other domains. Not all patients have data for all domains but more than 80-85% of our population have data for domains like conditions, drugs, measurement, visits. We don't have procedure data at all. c) Observation period starts at 1900-01-01 and ends at 3900-12-30 for all patients. d) I don't think positive and negative case distribution doesn't matter in this case, as the issue here I feel could not be due to that.

Kindly request you to let me know if you require any more information

SSMK-wq commented 4 years ago

I guess the issue is here. I might be wrong too but happy to be corrected.

image

The temp tables has cohort start as observation_start_date and cohort end date as observation_period_start_date + 1.

But when my XSpec cohort criteria is occurrence of T2DM code, why does the cohort start on 1900-01-01, shouldn't the cohort start date be on the date of occurrence of T2DM code (which is for example - 2134-07-21)?

As the cohort ends after 1 day (1900-01-02), all our measurements, drugs and conditions are beyond these dates (ex: 2134-07-21). So I guess it's not looking into those records.

Update

I modified the startDays and endDays parameter in CreateDefaultChronicCovariates function as shown below and am able to see more covariates now

startDays = 0,
endDays = 8000000   #but not sure whether this is right. How do I define the **start and endDays** for our dataset? If the cohort start date had been ins 2100s like that, I could have looked back into the past for base line features but since we have cohort start date as 1900-01-01, there isn't anything to look back.

image

but my question on

1) Why cohort start date is observation_period _start _date instead of actual index event (T2DM code occurrence)? 2) We would be looking at baseline features (before the index event) but in my case, I can't look back, so my startDays will still be 0 because cohort start date is 1900-01-01 and I can't look back any further.

Can guide me on this?