OHDSI / PheValuator

An R package for evaluating phenotype algorithms.
https://ohdsi.github.io/PheValuator/
17 stars 6 forks source link

Why cohort ends after 1 day of start date? #13

Closed SSMK-wq closed 4 years ago

SSMK-wq commented 4 years ago

Hello Everyone,

In our dataset the observation period start date is 1900-01-01 and observation period end date is 3900-01-01 for all patients.

They do have diagnosis/lab/drug recorded etc in 2100-0x-0x years.

I understand the way we have defined obs period will create issues for Age (which we will handle it internally) but can help me with below questions

For ex: cohort id = 103 was designed as like below

image

So when this is the case, why do I see a temp table (in scratch schema) being generated like below

image

1) Why does the cohort end after 1 day itself?

2) Why do I see subjects under cohort_id = 0 (Because my 3 cohorts only have Ids=103,104&105)

3) Why does every time I run the script, I get different number of subjects under this temp table?

In the first run, it was 5382 and 2nd run it was 5376

jswerdel commented 4 years ago

Hello 1) The code sets the end of the cohort period as one day after the start by design. It will not affect any processing as the length of the time in the cohort is not used in any calculations. 2) Subjects who are selected randomly outside the xSpec cohort are assigned cohort_id 0. These will be the noisy negatives (when creating the model) or the random selection of subjects for the evaluation cohort 3) The small difference in numbers may be due to rounding differences when developing the cohort from the prevalence. Which part of PheValuator were you running when you saw these differences?

SSMK-wq commented 4 years ago

Hi @jswerdel ,

1) Got it.

2) Noisy positives are XSpec cohort and noisy negatives are the XSens cohort. Am I right? Let's say I have a population of 500 subjects. XSpec cohort = 100 subjects, XSens= 100 subjects. Is evaluation cohort = remaining 300? You mean to say that evaluation cohort could be either 300(remaining subjects) or 100 subjects from XSens cohort? I thought it can only be 300 subjects for evaluation cohort because this is unseen by the model (as we built the model only using XSpec and XSens cohort)

3) This was during the temp table generation which I have shown in screenshot above. Once I was able to see 5382 records and other time I was able to see 5376 records

jswerdel commented 4 years ago

2) The xSens cohort is used to remove from the population those subjects who may be positive, leaving only the noisy negatives, those likely to be negative. all the xSpec subjects should be included in the xSens cohort. If xSpec, the extremely specific cohort, is >=5X the condition code and xSens is >=1X the condition code, all those with >=5X will also be in those with >= 1X. In your case, in a population of 500 subjects, if 100 are in the xSens cohort, then there will be left 400 noisy negatives, those with 0 condition codes. 3) The temp table is created both when you create the model and the evaluation cohort. Which function were you using?

SSMK-wq commented 4 years ago

2) Oh okay. I understand now. There will be an overlap between XSpec and XSens cohort. XSpec cohort are sureshot cases whereas XSens will have sureshot cases as well as recently diagnosed cases. So, evaluation cohort will be excluding XSens cohort. Am I right? You are awesome

3) CreatePhenoTypeModel. The first function in PheValuator.

jswerdel commented 4 years ago

2) In the first step of the process, createPhenotypeModel, the subjects used to create the model are the xSpec (noisy positives) and those not in xSens (noisy negatives). In the second step, createEvaluationCohort, the only subjects that are excluded from this set of subjects are those that were included in the subjects when the model (first step) was built. So some of the xSens subjects will be excluded, those that were also in the xSpec that were used in the model building process, but the rest of the xSens subjects may be included in the evaluation cohort if they happen to be selected in the random selection process. 3) I will do some testing to see if I get different subject counts.