OHDSI / PheValuator

An R package for evaluating phenotype algorithms.
https://ohdsi.github.io/PheValuator/
17 stars 6 forks source link

Require clarification on Terminologies - PheValuator #16

Closed SSMK-wq closed 4 years ago

SSMK-wq commented 4 years ago

Hi @jswerdel ,

I know we had a brief discussion on this topic. Anyway I was also referring your other post here which got me a bit confused again because of varied terminologies used in forum, doc, youtube etc (could be I misunderstood as well). I have tried to read online and broke them step by step and I have got the questions covered in detail. So, I would kindly request you to read the full post once (only because am not sure whether questions are ordered in proper sequence context-wise), so that it would be easy for you to understand and respond accordingly. I am picking the example from forum user

population size - 10K Lung Cancer - 9900 (atleast 1 code) No lung cancer - 100

a) XSpec - will consist only of subjects who are highly likely to have an HOI(Lung cancer). also called as Noisy Positives cohort.

We expect subjects to have atleast 10 codes (depending on condition that we study) to be called as highly likely to have an HOI. Let's say we have 4500 people with atleast 10 codes

Q1) But why it isn't a gold standard? When a subject has more than 10 condition codes for lung cancer in his timeline we know for sure and be confident that he experienced the HOI. can there be any other interpretation to this? Sorry, am not from healthcare background. So your inputs to help me understand this on why it isn't gold standard will be much appreciated

b) XSens - This cohort is created by considering subjects who have 1 or more condition codes. Let's say we have 9900 people with atleast 1 codes (4500 (>=10 codes)+5400 (<10 codes))

Q1) There will always be overlap between XSpec and XSens cohorts provided XSpec cohort fetches records based on our condition. In our example we see that 4500 (10 code people) are also present in XSens (along with rest 5400 people) Am I right?

Q2) XSens is different from Noisy Negatives cohort. They are two different cohorts. Am I right?

Q3) But there will never ever be any overlap between XSens and Noisy Negatives cohort. Am I right?

Q4) Because I see in YouTube Videos that XSens cohort is called as "Probably No" cohort but they do have evidence of HOI. Why is it then called as "Probably No" cohort?

c) Noisy Negatives - This cohort consists of subjects whoever is not present in XSens cohort. Am I right?

Q1) In our example of Lung Cancer, Noisy Negatives cohort will consists of 100 subjects. Am I right?

d) Prevalence cohort

Q1) Here we use XSens cohort because that's the cohort which gives proper estimate of prevalence of lung cancer in our cohort which is 9900. I guess almost always there will be no reason to use any other cohort (like XSpec) as prevalence cohort because they give inaccurate estimate. Am I right?

e) Target cohort

Q1) I see in the doc that Target Cohort is built as below

    Target cohort = XSpec + Noisy Negative (lung cancer example - 4500 + 100 = 4600)

    But may I know why this combination for Target cohort? 

Is it because ensuring a proper mix of Noisy Positives (high likely of having HOI) and Noisy Negatives samples (High likely of not having HOI) will help us study the characteristics of both the classes better?

f) Outcome cohort

Q1) What do we mean by outcome here? What is the outcome that we are looking for? Ex: in the case of forum example, if it's Lung Cancer, then are we looking for outcomes of Lung Cancer in the Target cohort?

Q2) So, How is this outcome cohort created? I couldn't find this anywhere in the doc.

g) PLP model

Q1) PheValuator uses LASSO regression model which is a supervised learning method requiring labels. Am I Right?

Q2) It trains the model based on the target cohort of 4600 subjects (Lung Cancer) and their variables. But how are the labels generated?

Q3) But again, in YouTube video I see that model is built based on XSpec & XSens. Is it how that works? But then why Target cohort is built using XSpec and Noisy Negatives? confusing.

Q4) The trained model is then evaluated on the Evaluation cohort (which is already used during the training phase. Refer below)?

h) Evaluation cohort

Q1) How is this evaluation cohort created? I understand that there is a function called "CreateEvaluationCohort" in the package and it uses a function parameter called XSpec cohort. There is no other cohort involved. Should I infer that Evaluation cohort will also be of 4500 subjects (same as XSpec)?

But in the doc, I see that " evaluation cohort - a large group of randomly selected subjects to be used to evaluate the phenotype algorithms (PA)."

But may I kindly request you to help me understand how is this random?. Keeping the imbalanced dataset issue and PheValuator limitations aside, can you help me understand how is this cohort is built in our example of Lung Cancer?

Q2) So are we evaluating our PLP model on the subjects under this evaluation cohort to produce the probabilities output. Am I right? But aren't these subjects already used during PLP model creation?

i) Main Population cohort

Q1) May I know what is the use of this cohort? Is it about just defining a cohort which will have all our subjects in the database? In the example of Lung cancer, it is 10K. Am I right? Should I just create a cohort in Atlas which will have all the subjects in our database?

jswerdel commented 4 years ago

a) Q1) It could be considered a "gold standard" but we say that it is very likely that these subjects have the HOI - there is still a small possibility that the coding could be incorrect and some of the subjects do not have the HOI. b) Q1) the xSpec cohort will be a subset of the xSens cohort Q2) the xSens cohort is not the noisy negatives. It is used to develop a cohort of noisy negatives. We think that any subject that is NOT in the xSens cohort (it is very sensitive for the HOI) will likely NOT have the HOI and these subjects (those not in the xSens cohort) will be our noisy negatives - subjects with a hig probability of NOT having the HOI Q3) Exactly right - those two cohorts never overlap Q4) The slide in the YouTube video gives the impression that the xSens cohort is "Probably no" but that is not quite right - it is used to develop a cohort (the noisy negatives) that is "Probably no" c) Q1) exactly right d) Q1) the xSens cohort gives the best estimate of the prevalence. An estimate using the xSens is good enough to produce a model using an approximate balance of cases and non-cases. e) Q1) The target cohort is built with both cases and non-cases. The model uses the outcome cohort (in our case those in the xSpec) to "match up" with the target cohort in order to correctly label those with the outcome. f) Q1) The outcome cohort is PheValuator is the xSpec cohort - it is used to label those in the Target cohort as cases and non-cases g) Q1) exactly right Q2) the labeling comes from finding those in the Target cohort that match with those in the Outcome cohort - when they match, these subjects are labeled cases (or "positives" or "those with the outcome") Q3) The modeling process uses the labels (as described above) to determine the predictors that will discriminate between cases (those with the HOI) and non-cases (those without the HOI) Q4) The model is applied to the evaluation cohort to determine the probability of each subject in the evaluation cohort having the HOI. The extracted covariates for subjects in the evaluation cohort are used with the coefficients from the developed model to determine each subjects probability of having the HOI. The software purposely excludes any subject that was used to develop the model from being in the evaluation cohort. This is done to eliminate any bias in the predicted probability for subjects that were used in model development. h) Q1) Unfortunately this is where your use of the PheValuator falls apart. Normally researchers don't already have a cohort where they are assured of the outcome as you are (again, as in a previous post, I am assuming you determined your cases and non-cases through clinical adjudication of patient health records). In most datasets, there are many subjects with the HOI but many. many more without the health outcome. The evaluation cohort is a random selection of these subjects. When the selection is random, the prevalence of the HOI in this random set should be about the same as the full dataset population. When that is the case, PheValuator can use this large set of subjects to analyze various phenotype algorithms applied to the same dataset to determine the performance characteristics. Q2) Just the reverse - we are evaluation the subjects in the evaluation cohort using the PLP model. As I stated above, the software purposely does not include any subjects in the evaluation cohort that were in the group of subjects used to develop the PLP model. i) Q1) The main population cohort does not have to be used (and rarely is) - you can leave it as the default value. It is included as a possible parameter in cases where the HOI is so rare that you need a population that has a larger proportion of those with the HOI. For example, if the HOI is Restrictive Cardiomyopathy, a very rare form of cardiomyopathy, then you may want to use a main population cohort of those with any heart disease. If you don't do this, in even a large, randomly selected, evaluation cohort of 2M subjects, you may only find 5-10 subjects with this HOI leading to possibly inaccurate performance characteristics. If you use only subjects with heart disease as you main population cohort, you may have 100-200 subjects with this HOI which would improve the accuracy of the measured performance characteristics.

SSMK-wq commented 4 years ago

Hi @jswerdel ,

Thanks a ton for patiently answering my questions. Much appreciated. Just a few quick follow-up questions only to make sure I use this tool properly

1) Regarding Target (g) & Outcome cohort (f)

Based on your response,

Target cohort = XSpec (4500) +XSens(5400) Outcome cohort = XSpec(4500)

a) Through your response, I understand that Outcome cohort is XSpec. But can you guide me to the documentation where I can find this info? I think I couldn't find this but might be am wrong. I ask only to read it and understand better?

b) So for PLP model building, am I right to understand that we generate labels based on "matching up" items only and there is no ML involved anywhere here in this label generation. It's just a match based on subject_id? Looking for common subject_ids between two cohorts (T & O) ?

c) Will it be like a dataset of 9900 records (along with features) in which 4500 records have label 1 (positive because match with XSpec) and 5400 records (remaining subjects from XSens) have label 0 (Negative)? Am I right?

d) If c) above is correct, then the 5400 subjects which are not noisy negatives (but remaining subjects from XSens), we are labeling them as 0 (NEGATIVE) just because they weren't in XSpec?? Aren't we labeling incorrectly through this approach?

2) Regarding Evaluation Cohort

Let me keep my dataset issue aside and try to get the concepts right here

a) From your response, I understand that the tool purposefully excludes all subjects that were included in XSpec, XSens. Meaning then Evaluation cohort is (random selection of subjects from ) Noisy Negatives cohort. Am I right?

b) But why do we see function parameter called "XSpec" in CreateEvaluationCohort function?

c) Let's say I have for ex: 1000 subjects in evaluation cohort. Now the PLP model applied on this evaluation cohort let's say produce probabilistic values as labels to this subjects. Where is threshold defined?

d) > 0.5 = Positive (600 subjects) and <0.5 = negative (400 subjects)

PA assessment

a) Let's say I create a cohort definition in Atlas for a rule based Phenotype algorithm and apply it on my datasource and it results in 2000 subjects.

b) Now I check whether the 600 subjects from 2d above are included in PA and 400 from 2d above are not included in PA. Am I right?

And this is what PheValuator does. Am I right?

jswerdel commented 4 years ago

1) a) it is not in the documentation because users of PheValuator do not need to assign xSpec to the Outcome cohort in the PLP function - the tool does it automatically. For documentation about the Output cohort please refer to the PLP package b) yes - after the matching the ML begins c) no, only the 100 non-cases (those not in xSens) will have an Outcome (label) of 0 d) see c 2) a) Think about it as any subject that was used in model creation will not be used in the evaluation cohort. This will then exclude the xSpec subjects used to create the model and the noisy negatives used b) xSpec subjects are used to included some with the outcome. If there is no one with the outcome the model stops. It is a limitation of the PLP package. Those included in the evaluation cohort with the outcome are excluded from performance characteristic calculations. c) The documentation explains how the performance characteristics are calculated without using a threshold. If the person is included in the phenotype algorithm that you are testing, the calculated probability (p) is used added to the true positive bucket, 1 - p is added to the false positive bucket, and so on. d) You can use thresholds if you wish (e.g., > 0.5 = positive) and the performance characteristics will be calculated based on that threshold. The software gives you the option of using a threshold or using the expected value ("EV").

PA Assessment a) that is exactly right - in this any of the 600 positives that are included in your PA will be TPs and those that were not included will be FNs (the algorithm considered them negatives and it was wrong). On the other side, if any of the 400 negatives are included in the PA those would be considered FPs - the algorithm included these subjects as positives and it was wrong. TNs are those in the 400 not included in the PA.