Closed gowthamrao closed 4 years ago
Hi @gowthamrao ,
Just trying to understand what the feature is that you'd like to construct. Do you want to know the unique list of places of service where a person has received care? Or do you want one list where they got their drugs, another where they were diagnosed? Or do you want a list for every drug-place combination, every visit-place combination, etc.?
@schuemie
Do you want to know the unique list of places of service where a person has received care?
we want count of visit records per place of service e.g. number of unique visits per person to doctor's office
Or do you want one list where they got their drugs, another where they were diagnosed?
yes - number of records in drug_occurrence at doctor's office. e.g. number of unique visits to doctors office per person where chemotherapy was administered or number of records in visit_occurrence where primary diagnosis was 'influenza' or number of unique visits to with inpatient place of service per person where primary diagnosis was 'chest pain' and procedure was 'cardiac catheterization'
Or do you want a list for every drug-place combination, every visit-place combination, etc.?
no
@schuemie - i was thinking of something like below, where @place_of_service will be a new optional parameter in R-function call of FeatureExtraction2.
ConceptCount.sql
SELECT subject_id,
cohort_start_date,
{@temporal} ? {
time_id,
}
concept_count
INTO #raw_data
} : {
SELECT 1000 + @analysis_id AS covariate_id,
{@temporal} ? {
time_id,
}
row_id,
concept_count AS covariate_value
INTO @covariate_table
}
FROM (
SELECT
{@temporal} ? {
time_id,
}
{@aggregated} ? {
subject_id,
cohort_start_date,
} : {
cohort.@row_id_field AS row_id,
}
{@sub_type == 'distinct'} ? {
COUNT(DISTINCT @domain_concept_id) AS concept_count
} : {
COUNT(*) AS concept_count
}
FROM @cohort_table cohort
INNER JOIN @cdm_database_schema.@domain_table
ON cohort.subject_id = @domain_table.person_id
{@place_of_service} ? {
{@domain_table == 'visit_occurrence'} ? {
INNER JOIN @cdm_database_schema.care_site care_site
ON @domain_table.care_site_id = care_site.care_site_id
}
{@domain_table %in% ('condition_occurrence', 'device_exposure', 'drug_exposure', 'measurement','observation','procedure_occurrence')} ? {
INNER JOIN @cdm_database_schema.visit_occurrence visit_occurrence
ON @domain_table.visit_occurrence_id = visit_occurrence.visit_occurrence_id
INNER JOIN @cdm_database_schema.care_site care_site
ON visit_occurrence.care_site_id = care_site.care_site_id
}
}
{@temporal} ? {
INNER JOIN #time_period time_period
ON @domain_start_date <= DATEADD(DAY, time_period.end_day, cohort.cohort_start_date)
AND @domain_end_date >= DATEADD(DAY, time_period.start_day, cohort.cohort_start_date)
WHERE @domain_concept_id != 0
} : {
WHERE @domain_start_date <= DATEADD(DAY, @end_day, cohort.cohort_start_date)
AND @domain_end_date >= DATEADD(DAY, @start_day, cohort.cohort_start_date)
AND @domain_concept_id != 0
}
{@excluded_concept_table != ''} ? {
AND @domain_concept_id NOT IN (SELECT id FROM @excluded_concept_table)
{@place_of_service} ? {
AND care_site.place_of_service_concept_id NOT IN (SELECT id from @excluded_concept_table)
}
}
{@included_concept_table != ''} ? {
AND @domain_concept_id IN (SELECT id FROM @included_concept_table)
{@place_of_service} ? {
AND care_site.place_of_service_concept_id IN (SELECT id from @included_concept_table)
}
}
{@cohort_definition_id != -1} ? { AND cohort.cohort_definition_id = @cohort_definition_id}
GROUP BY
{@temporal} ? {
time_id,
}
{@aggregated} ? {
subject_id,
cohort_start_date
} : {
cohort.@row_id_field
}
) raw_data;
Just checking: do you really mean place_of_service_concept_id
, or are you actually thinking of visit_concept_id
? The first can have many values, like:
concept_id | concept_name |
---|---|
8782 | Urgent Care Facility |
8809 | Independent Laboratory |
8827 | Custodial Care Facility |
8844 | Other Place of Service |
8850 | Ambulance - Air or Water |
8851 | Group Home |
8858 | Mass Immunization Center |
… | … |
The second can have only five different values:
concept_id | concept_name |
---|---|
9201 | Inpatient Visit |
42898160 | Long Term Care Visit |
9203 | Emergency Room Visit |
9202 | Outpatient Visit |
262 | Emergency Room and Inpatient Visit |
Definitely place_of_service_concept_id
. visit_concept_id is a derived field that is calculated during the ETL process, most source data have place_of_service_concept_id
. http://www.ohdsi.org/web/atlas/#/conceptset/1655609/details
e.g. visit_concept_id has 'Outpatient Visit', but it includes many place_of_service_concept_id's like
Concept ID | Concept Code | Concept Name | Standard Concept |
---|---|---|---|
8883 | 24 | Ambulatory Surgical Center | S |
8716 | 49 | Independent Clinic | S |
8940 | 11 | Office | S |
8977 | 71 | Public Health Clinic | S |
8761 | 72 | Rural Health Clinic | S |
8782 | 20 | Urgent Care Facility | S |
38003620 | 17 | Walk---in Retail Health Clinic | S |
using just visit_concept_id is lossy. e.g. we want to separately calculate 8883, and 8940, and combination of 8863 (Skilled Nursing Facility) & 8676 (Nursing Facility). We want to calculate at both concept_id level and combination of concept_id's
Use case -- if we are predicting admission to hospital outcome for a cohort of diabetics, we would like to use the feature - number of 8940 place of service visit occurrence in long-term period prior to cohort entry.
I'll second @gowthamrao suggestion. Definitely would be useful to have by specific place_of_service_concept_id
, especially for the outpatient category in visit_concept_id
.
And, if possible, to create an "all other outpatient" category based on those that haven't been broken out on their own. For example if I want to look at Office visits (8940) and Nursing facility stays (8863 & 8676) I'd also want to see all other outpatient services not included in either of those buckets.
@gowthamrao : do you want to take a stab at implementing the proposed changes? I've just brought the develop branch up-to-date, we can land your pull-requests there and work on debugging the code.
Hi @gowthamrao . Sorry, took me a while to make time to take a thorough look at your code. I have some questions and comments:
In general:
I thought the idea was to have counts per place_of_service_concept_id? Currently this code is just counting visits that have some (combination of) type of records associated with them.
Also, there are currently 128 new analyses which each generate exactly 1 covariate ID. Our analysis ID space is limited to 3 digits while our covariate ID space is virtually limitless. Maybe we could group some of these analyses?
More specific:
What is the difference between analysis 120 (VisitOccurrenceRecordCountsLongTerm) and analysis 917 (VisitCountLongTerm)?
I'm having trouble understand what the purpose of analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm) is. What the code does is count the number of visits in the window that are associated with care sites that do not have care_site_id = 0. What is the significance of care_site_id = 0?
Similarly, what is the purpose of analysis 122 (VisitOccurrenceRecordCountsByCoLongTerm)? This counts the number of visits in the window that are associated with a condition_occurrence, where the visit_occurrence_id is not 0. visit_occurrence_id is an auto-increment number. What is the significance of visit_occurrence_id = 0?
Comments and questions specific to RecordCounts.sql
:
line 141: please don't hardcode analysis IDs. What if someone want to use different analysis IDs? This is precisely what the sub_type field is for (which you didn't use).
line 142: @domain_table_id only works because @domain_table is replaced with the table name. This is a bit confusing, because it suggests there is a variable called @domain_table_id. Why not simply count the number of records using COUNT(*)?
line 240: why would @domain_table_id be 0? (It is an auto-increment number). What is the significance of @domain_table_id = 0? Do you mean @domain_concept_id?
line 246: Here's the only difference I see between analysis 120 and 917: analysis 917 includes visits where the end date is after the window start, even though the visit start might be before the window start. In contrast, analysis 120 only considered visits that start in the window. Is the distinction important? If it is, this is something to explicitly mention in the descriptions.
Line 369-382: Having just 2 different covariate names for 128 different covariates might not be very informative.
Can you help us understand the concept of covariates?
Ok, let me give it a shot:
First, note that I tend to use the words 'covariate', 'feature', and 'baseline characteristic' interchangeably.
An analysis is defined as a process that creates one or more similar covariates. For example, prespecified analysis 102 creates a separate covariate for each condition concept encountered in the long-term window (default is past 365 days).
A covariate is a variable that will have one value for each entry in the cohort table, and that value can be different for each cohort entry. One such covariate is whether the concept 'fever' (concept ID 437663) has been encountered in the long term window. In analysis 102, we construct a unique covariate ID by concatenating the concept ID to the analysis ID, so this covariate will have ID 437663102. In this case, this covariate is binary; it can only have the value 0 (concept not found in window) or 1 (concept found in window).
So if I think about your analysis 121 (VisitOccurrenceRecordCountsByPosLongTerm), I would expect it to generate one covariate per unique place_of_service_concept_id.
@schuemie
place_of_service_concept_id is on the care_site table. Visit_occurrence and visit_detail may be joined to care_site using FK care_site_id. Condition_occurrence, procedure_occurrence, drug_occurrence may be joined to care_site thru visit_occurrence table.
Use-case: counts at home, emergency room, in-patient, out-patient, doctor's office. These are either concepts or concept-sets/covariates of place_of_service_concept_id.
I think using DomainConcept.sql would make it complex - e.g. in DomainConcept.sql we currently have
when @domain_table is visit_occurrence it needs to become something like
when @domain_table is condition_occurrence it needs to become something like
What would be the better way of handling place_of_service_concept_id?