OHDSI / Aphrodite

[in development]
Apache License 2.0
37 stars 15 forks source link

Configure number of mentions of a keyword #13

Closed Ak784 closed 3 years ago

Ak784 commented 4 years ago

Hello Folks,

Akshay here. I recently came across your paper on APHRODITE Phenotype validation

https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocaa032/5831103

I have a quick question. I see that you have got a number of mentions of keyword in your paper.

But in the example/sample file shown in github, I see that we can only key in "aphordite_concept_name".

But can I know how can you configure the number of mentions of a keyword like 2 times, 4 times etc? Is it shown in the sample files?

Is it the num_keywords??

jmbanda commented 4 years ago

Greetings, you are correct that functionality is not included in standard APHRODITE. One can easily add it by changing the getCohort function. Might be added later to a later version of the software, but no promises.

Ak784 commented 3 years ago

Hi @jmbanda ,

I was trying to write the code for getting the no of times a concept appears in patients' domain and then assign subjects as cases if they have more than 4 mentions

patients_list_df<- list()
casesANDcontrols_df<- list()

    if (missing(searchDomain)) { #Backwards compatibility
        #Get all case patients in the cohort - from observations table - remove patients with ignore keywords
        patients_list_df[[1]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.observation WHERE observation_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND observation_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)

        #Get all case patients in the cohort -  from condition occurrence - remove patients with ignore keywords
        patients_list_df[[2]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.condition_occurrence WHERE condition_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND condition_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)

    } else { print("hi else")
        intN=1
        if (searchDomain$observation[1]) { print("hi obs")
            #Get all case patients in the cohort - from observations table - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.observation WHERE observation_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND observation_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)
            intN = intN+1
        }
        if (searchDomain$condition[1]) { print("hi cond")
            #Get all case patients in the cohort -  from condition occurrence - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.condition_occurrence WHERE condition_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND condition_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)
            intN = intN+1
        }
        if (searchDomain$measurement[1]) { print("hi meas")
            #Get all case patients in the cohort -  from measurement - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.measurement WHERE measurement_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND measurement_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)
            intN = intN+1
        }
        if (searchDomain$drug_exposure[1]) { print("hi drug")
            #Get all case patients in the cohort -  from drug_exposure - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT person_id,count(*) FROM @cdmSchema.drug_exposure WHERE drug_concept_id IN (", paste(includeConceptlist,collapse=","), ") AND drug_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ",sep='',paste("group by person_id")),dbms)
            intN = intN+1
        }
        if (searchDomain$noteNLP[1]) { print("hi note")
            #Get all case patients in the cohort -  from note_nlp - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT distinct(A.person_id) FROM @cdmSchema.note_nlp as B, @cdmSchema.note as A WHERE B.note_nlp_concept_id IN (",paste(includeConceptlist,collapse=","), ") AND B.note_nlp_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") AND B.term_modifiers='negated=false,subject=patient' AND A.note_id = B.note_id;", sep=''),dbms)
            intN = intN+1
        }
        if (searchDomain$procedure[1]) { print("hi proc")
            #Get all case patients in the cohort -  from procedure_occurrence - remove patients with ignore keywords
            patients_list_df[[intN]] <- executeSQL(conn, cdmSchema, paste("SELECT distinct(person_id) FROM @cdmSchema.procedure_occurrence WHERE procedure_concept_id IN (",paste(includeConceptlist,collapse=","), ") AND procedure_concept_id NOT IN (", paste(excludeConceptlist,collapse=","), ") ;", sep=''),dbms)
            intN = intN+1
        }
    }

merged_records <- rbindlist(patients_list_df)[,.(count = sum(count)),by = person_id]
filtered_records = merged_records[count>4]           # consider subjects as cases only when they have more than 4 mentions.
casesANDcontrols_df[[1]] = filtered_records[[1]]      # to get only the person_id column (and ignore the count column)

Though I am able to implement this, if I was to experiment with different n number of mentions. Let's say instead of 4, I would like to try with 3 or 5 or 6 mentions. The only way is to update the filter criteria every time I run?

Is this how its done? Or am I following an inefficient approach?

jmbanda commented 3 years ago

Hello,

This approach would work, but not always, particularly when you are looking for more than 1 keyword. I would shift this to the query side and do a group by person and concept_id. But you can easily just do the grouping in R by both columns as well.