OHDSI / Aphrodite

[in development]
Apache License 2.0
37 stars 15 forks source link

APHRODITE - FAQ's #6

Closed SSMK-wq closed 4 years ago

SSMK-wq commented 4 years ago

Hello @jmbanda ,

Can you please confirm whether my understanding of the below items are correct? I watched your YouTube tutorial and went through the codes in Github

1) Am I right to understand that APHRODITE uses only Anchors (CDM concepts) to label patient records as having HOI? If Anchor present, Label = 1 else Label = 0.

If labeling is done using any other approach, can you please help me understand what other approach does it use to label patient records?

2) But by using just "Anchors" to label patients, won't we lose patients who are actual cases?. Meaning you are using the parent concept like "Type 2 Diabetes Mellitus" as search term to extract concepts/anchors. Right? Will this result in the child concepts only or will it result in keyword list which will contain all related terms from the OHDSI vocabulary? On what basis are these concepts extracted from Voabulary and stored in keywordlist?

3) APHRODITE labels patients using "Anchors". Once patient's are tagged as case and controls, corresponding features from all domains are extracted? Do we have flexibility to select to extract data as features only from domains of our interest (instead of all)?

4) Am I right to understand APHRODITE uses concept occurrence count as feature values? Meaning it checks how many times does a specific concept exist in patient's data? This is different from PheValuator which looks at presence or absence of concept as feature value (binary values)? Am I right?

5) Does APHRODITE allows us to modify the feature extraction process? Meaning instead of freq count as feature values, can I choose to have any other approach like summary as feature values?

6) I see that the YouTube ends abruptly. But after we train the model, we have to test our model. So any and all data/patients which are excluded from cases and controls would be used as test data. Am I right?

jmbanda commented 4 years ago

I strongly recommend that you read the documentation and the papers before playing around with APHRODITE, the majority of the questions you ask are found there. Find my responses inline.

Am I right to understand that APHRODITE uses only Anchors (CDM concepts) to label patient records as having HOI? If Anchor present, Label = 1 else Label = 0. If labeling is done using any other approach, can you please help me understand what other approach does it use to label patient records?

Just the anchor or keywords, no other approach.

But by using just "Anchors" to label patients, won't we lose patients who are actual cases?. Meaning you are using the parent concept like "Type 2 Diabetes Mellitus" as search term to extract concepts/anchors. Right? Will this result in the child concepts only or will it result in keyword list which will contain all terms from the OHDSI vocabulary? On what basis are these concepts extracted from Voabulary and stored in keywordlist?

As the documentation and a simple trial of the APHRODITE will show you. It will list ALL concepts related (by the vocabulary hierarchies and synonyms) to the one your select. As for the actual cases comment, what do you mean? at this stage you don't know which are actual cases to begin with.

APHRODITE labels patients using "Anchors". Once patient's are tagged as case and controls, corresponding features from all domains are extracted? Do we have flexibility to select to extract data as features only from domains of our interest (instead of all)?

Yes, the documentation states this. You can block any or all domains, based on the data available to you.

Am I right to understand APHRODITE uses concept occurrence count as feature values? Meaning it checks how many times does a specific concept exist in patient's data? This is different from PheValuator which looks at presence or absence of concept as feature value (binary values)? Am I right?

Yes, Joel already mentioned this.

Does APHRODITE allows us to modify the feature extraction process? Meaning instead of freq count as feature values, can I choose to have any other approach like summary as feature values?

Yes, the documentation talks about ways to aggregate. But you can also write your own functions to do it whatever way you need.

I see that the YouTube ends abruptly. But after we train the model, we have to test our model. So any and all data/patients which are excluded from cases and controls would be used as test data. Am I right?

In some cases yes.