OHDSI / Aphrodite

[in development]
Apache License 2.0
37 stars 15 forks source link

Que - Evaluation using Phevaluator and inclusion of single code #16

Closed Ak784 closed 3 years ago

Ak784 commented 3 years ago

Hello @jmbanda,

I was reading related papers this and this and have a few quick questions.

I am pasting the line from the paper

We chose not to exclude a single mention of relevant disease-specific codes as potential features used by classifiers since our labeling function was based on multiple mentions.

For the above line, let's consider the below example where we assign patients as cataract cases if they have 4 or more snomed codes.

       Patient 1 - 6 disease specific codes (ex: Cataract)
       Patient 2 - 4 disease specific codes (ex: Cataract)
       Patient 3 - 5 disease specific codes (ex:Cataract)
       Patient 4 - 2 disease specific codes (ex: Cataract)

So, based on the above statement (pasted), Am I right to understand that when we build a feature vector using Aphrodite, we will have something like below

             Condition:375545
Patient 1           0
Patient 2           0
Patient 3           0
Patient 4           2

Q1) I have given zero for patients 1,2 and 3 because they had more than 4 mentions of code (and they will not be used as features) whereas we have the frequency count of patient 4 because he/she had only 2 mentions (< 4 mentions), so we use them as features? Is my understanding right?

Second, I understand that APHRODITE classifier performance is assessed using ground truth from rule-based algorithms. I also came across your suggestion in the paper where we could assess them using either clinician labeled test sets or phevaluator.

While I understand about clinician labeled test sets, may I understand how can Aphrodite classifier performance be evaluated using Phevaluator?

I understand Phevaluator also uses the Noisy labeling approach (Xspec, Xsens, etc) to label the dataset for cases and controls. Usually, once we build the pheval model, we pass our Phenotype algorithm to test (defined in Atlas) and it will output the performance characteristics such as TP, TN, FP, and FN, etc of the phenotype algorithm (which was defined in Atlas)

Q2) My question is how do we pass our APHRODITE model as input to the Phevaluator (which only accepts the ATLAS cohort-id as input) to assess the APHRODITE classifier performance? Did you mean we should create a dummy table (with dummy observation periods) in our results schema with APHORIDTE identified subjects (as cases) and assign a dummy (ATLAS) cohort id and use this cohort id in Phevaluator? If not, can you let me know how can we establish the connection between these two?

Q3) I also read that in the paper it is mentioned that, test sets (created from rule based algorithms) must match population prevalence of a disease. So, I have a sample of 100000 subjects from a hospital in US. When I create a test set of say 10000 subjects, should I select only 800 (8% prevalence of T2DM in US) positive cases and fill up the rest (9200) with controls (for which we aren't concerned about prevalence because 9200 might have multiple diseases (as they are controls))

Thanks a lot for all your help. I have benefitted from your related papers on this topic.

jmbanda commented 3 years ago

As answers to your questions:

Q1) You are correct. However, this is not a 'stock' option in Aphrodite, you need to code this functionality. Q2) You pass the patient_id's found by the Atlas definition into Aphrodite. They can be passed starting from the getPatientData functions. Q3) That is correct, ideally you want the cases:controls sampling to represent the prevalence, so if you have a 8% prevalence you pick 800 cases and 9200 controls as you mention.

Ak784 commented 3 years ago

Hi @jmbanda ,

I guess for Q2 there was a slight misunderstanding. My doubt was mainly on how can we use PheValuator to assess the APHRODITE classifier performance. How do I assess the performance of APHRODITE classifier (ML based phenotyping algorithm) using Phevaluator?

But I guess your response was on how to pass subject ids to APHRODITE. Or am I missing something here?

Can I kindly request your help to elaborate this a bit? Appreciate your effort and time taken to address the queries.

jmbanda commented 3 years ago

Simply, you check the probabilities on both methods and compare. What I responded is how to pass cases and controls found with an Atlas cohort into APHRODITE, as your question asked for this. You can also do the converse with patients found by an Atlas cohort and build models using APHRODITE from them.