OHDSI / Aphrodite

[in development]
Apache License 2.0
37 stars 15 forks source link

Anchor variables - APHRODITE #4

Closed SSMK-wq closed 3 years ago

SSMK-wq commented 4 years ago

Hi,

I was reading the APHRODITE package manual and its paper.

1) May I know how different is "Anchor variables" from "terms extracted from clinical narratives"?

Aren't these two the same?

And when it comes to OMOP CDM, Am I right to understand that "OMOP concepts" will be called 
as "Anchor Variables"?

2) Or is Anchor variables like questions which were asked to a patient? ex: "Do you think this patient has an infection?"

When we do retrospective analysis, I guess "Anchor variables" can't be questions but just occurrence of certain terms in patient medical records/EHR?

3) Can help me by providing ordinary layman term explanation on how the response for the above example question can/cannot be qualified as a "Anchor variable"? I guess Anchor variables has 2 property which has to be met. One is conditional independence and high PPV.

For conditional Independence the definition as extracted from the paper "Electronic medical record phenotyping using the anchor and learn framework"

"The second condition is conditional independence. This is a formal condition that requires that the patient’s phenotype is the best predictor of whether or not the anchor is present in the medical records and that no other data in the record would improve the prediction if the patient’s phenotype were already known."

ex: Let's say patient has T2DM phenotype. Anchor variable is "Type 2 diabetes Mellitus". Am not sure how to fit this example to the above definition (conditional independence)

Can help me with the above 3 questions please?

jmbanda commented 4 years ago

Hello,

Sure, my responses are inline:

1) May I know how different is "Anchor variables" from "terms extracted from clinical narratives"? Aren't these two the same?

_They are not, as anchor variables can be concepts for conditions, procedures, labs, drugs, etc. That are not necessarily found on the NotesNLP table. This adds more flexibility to build phenotypes and allows people with no annotated clinical narrative to benefit from using this package.

And when it comes to OMOP CDM, Am I right to understand that "OMOP concepts" will be called as "Anchor Variables"?

Yes, exactly

2) Or is Anchor variables like questions which were asked to a patient? ex: "Do you think this patient has an infection?"

__Not necessarily, you can use terms commonly found in patient narratives (if you have a populated notenlp table) as Anchors. These terms might be 'strings of text' that are not necessarily captured in conditions, procedures, diagnosis, etc. codes, hence only available in annotated clinical narratives (AND and this is a big AND that they also appear in some form in the OHDSI vocabulary).

When we do retrospective analysis, I guess "Anchor variables" can't be questions but just occurrence of certain terms in patient medical records/EHR?

Can help me by providing ordinary layman term explanation on how the response for the above example question can/cannot be qualified as a "Anchor variable"? I guess Anchor variables has 2 property which has to be met. One is conditional independence and high PPV.

"The second condition is conditional independence. This is a formal condition that requires that the patient’s phenotype is the best predictor of whether or not the anchor is present in the medical records and that no other data in the record would improve the prediction if the patient’s phenotype were already known."

ex: Let's say patient has T2DM phenotype. Anchor variable is "Type 2 diabetes Mellitus". Am not sure how to fit this example to the above definition (conditional independence)

Ideally as an anchor you are looking for something that is highly indicative of the phenotype you want, for example for T2DM you want a mention or code of Type 2 diabetes Mellitus as you indicate. This anchor will allow you to fetch patients that are highly likely to have the phenotype and will allow you to identify additional features that from the group of patients come up as of great importance. However, you can also pick as an anchor a measurement of blood in the sugar and while you might get people with T2DM, you will plenty of false positives as well, thus not making it a very useful anchor.

Do let me know if this has cleared up some of the questions, or if it brings more to the table :)

SSMK-wq commented 4 years ago

Hi @jmbanda ,

Thanks for the response and your time. Few quick follow up questions

This anchor will allow you to fetch patients that are highly likely to have the phenotype and will allow you to identify additional features that from the group of patients come up as of great importance.

I understand that one of the property of Anchor variables is high PPV. But can you help me understand with a simple example on "Conditional Independence" property of Anchor Variable?

For ex: If I am looking for a code "Type 2 Diabetes Mellitus" in the patient records, I am able to get that this anchor variable might meet the "High PPV" property of Anchor variable. But I read in the paper that when you pick a Anchor variable, you have to make sure that it satisfies 2 conditions given below

a) High PPV b) Conditional Independence

In my example, I know that "Type 2 Diabetes Mellitus" will result in high PPV but how do I check for conditional independence? Though I read the definition of Conditional Independence in the paper, I couldn't really understand what it is.

So, can I kindly request you to help me with this in layman terms?