NU-QDSC / cancer-informatics

Repository to track cancer informatics best practices and shared code.
3 stars 1 forks source link

Integrate catchment area reference data, nm volume data and accrual data at the county level into the LCC disease site reports. #122

Open mgurley opened 6 months ago

PerifanosPrometheus commented 5 months ago

Reached out to Catherine two times to get an initial meeting scheduled and received no response.

Unsure what's the priority on this.

Danton6 commented 5 months ago

Update on my end. The new data from Cancer InFocus now has Hispanics as one of the race/ethnicity groups. With this, we can now make full comparisons across cancer sites, race/ethnicity, sex, and counties to serve as references for comparisons with accrual and volume data.

PerifanosPrometheus commented 4 months ago

Michael reached out to Cary to get a meeting scheduled.

PerifanosPrometheus commented 4 months ago

No significant update as I have been working through higher priority requests.

PerifanosPrometheus commented 2 months ago

Need to discuss with Fang to figure out where catchment area reference data is loaded in Notis.

fanglinnw commented 2 months ago

Need to discuss with Fang to figure out where catchment area reference data is loaded in Notis.

here's a sample SQL to get 5 rows of county and 5 rows of tract data for each measurement:

WITH cte AS (
    SELECT
        d.file_date,
        d.decimal_value,
        d.integer_value,
        d.string_value,
        d.race_ethnicity,
        d.sex,
        m.name AS measure_name,
        m.category as measure_category,
        ge.entity_type AS geo_entity_type,
        ge.id AS geo_entity_id,
        ge.fips AS geo_entity_fips,
        ge.county AS geo_entity_county,
        ge.state AS geo_entity_state,
        ge.tract AS geo_entity_tract,
        ROW_NUMBER() OVER (
            PARTITION BY m.id, ge.entity_type
            ORDER BY d.id
        ) AS rn
    FROM
        catchment_area_data d
    JOIN
        catchment_area_measures m ON d.catchment_area_measure_id = m.id
    JOIN
        catchment_area_geo_entities ge ON d.catchment_area_geo_entity_id = ge.id
    WHERE
        d.file_date = (SELECT MAX(file_date) FROM catchment_area_data)
        AND ge.entity_type IN ('tract', 'county')
)
SELECT
    file_date,
    measure_category,
    measure_name,
    geo_entity_type,
    geo_entity_fips AS fips,
    geo_entity_county AS county,
    geo_entity_state AS state,
    geo_entity_tract AS tract,

    decimal_value,
    integer_value,
    string_value,
    race_ethnicity,
    sex

FROM
    cte
WHERE
    rn <= 5
ORDER BY
    measure_name, geo_entity_type;
PerifanosPrometheus commented 1 month ago

This appears to be a more accurate mapping for disease sites: https://seer.cancer.gov/siterecode/icdo3_2023/

We will only map to the disease sites in the reference data. So whatever level are available on the reference data.