CancerRegistryOfNorway / NORDCAN

NORDCAN is a database of cancer statistics for the Nordic countries: Denmark, Finland, Iceland, Norway, Sweden, the Faroe Islands, and Greenland.
Other
7 stars 0 forks source link

Age group columns #15

Closed WetRobot closed 1 year ago

WetRobot commented 3 years ago

A number of age group columns are used in the nordcan R framework, with varying definitions. I suggest that in a future update of the nordcan R framework each age group column will have a name that indicates its definition. E.g. column with 19 age groups would be named "ag_19".

There's also "age", which can remain "age". This column have values 0 and upwards.

CotterpinDoozer commented 3 years ago

Unsure of the priority of this. We will discuss in our meeting on Monday so we can set a milestone.

CotterpinDoozer commented 3 years ago

You can discuss with Bjarte here if necessary.

CotterpinDoozer commented 1 year ago

Anna will look into how big this issue really is.

AnnaSkog commented 1 year ago

I have found the following age-variables. We can discuss whether it is necessary to change any variable names.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Variable name | Definition | Values | Where (dataset) | Comment -- | -- | -- | -- | -- age | age at diagnosis | 0-121 | unprocessed_cancer_record_dataset | Created by user (call for data) age | age | 0-99 | national_population_life_table | Created by user (call for data) age | age at diagnosis | 0-121 | cancer_record_dataset | Created by user/variable from call for data agegroup | 5-year age group | 1-21 | unprocessed_cancer_death_count_dataset | Created by user (call for data) agegroup | 5-year age group | 1-21 | general_population_size_dataset | Created by user (call for data) agegroup | 5-year age groups for age at diagnosis | 1-21 | cancer_record_dataset | Created by R-program from Age agegroup | 5-year age groups for age at diagnosis | 1-21 | cancer_death_count_dataset | Created by user (call for data) agr_all_ages | Agegroup for survival all ages together | 0-1 | cancer_record_dataset | Created by R-program from Age agr_all_sites | Agegroups to be used for other survival analysis | 1-6 | cancer_record_dataset | Created by R-program from Age agr_bone | Agegroups to be used for survival analysis of bone, Hodgkin and testis | 1-6 | cancer_record_dataset | Created by R-program from Age excl_surv_age | Excluded from survival due to age 90+ | 0-1 | cancer_record_dataset | Created by R-program from Age

BjarteAAGNES commented 1 year ago

Only age is used in extract_define_survival_data.ado

L199-231

Thus, the following variables are NOT used

agr_all_ages | Agegroup for survival all ages together | 0-1 | cancer_record_dataset | Created by R-program from Age -- | -- | -- | -- | -- agr_all_sites | Agegroups to be used for other survival analysis | 1-6 | cancer_record_dataset | Created by R-program from Age agr_bone | Agegroups to be used for survival analysis of bone, Hodgkin and testis | 1-6 | cancer_record_dataset | Created by R-program from Age excl_surv_age | Excluded from survival due to age 90+ | 0-1 | cancer_record_dataset | Created by R-program from Age
AnnaSkog commented 1 year ago

I see. You haven’t listed "agr_all_ages", but I assume (from the code you sent) that this variable is not used either.

We are left with only two age-variables: age and agegroup. Both variables have the same meaning in all datasets/uses. The variable agegroup could change name to agr_21, but is that necessary? What do you think @CotterpinDoozer?

AnnaSkog commented 1 year ago

We will keep the variable names age and agegroup.

@HuidongTian the variables agr_all_ages, agr_all_sites, agr_bone and excl_surv_age could be taken out of the Age program in R, if they are not used in the incidence, mortality or prevalence calculations.

If the 4 age-variables mentioned above is taken out of the Age-program we will need to update the Wiki: https://github.com/CancerRegistryOfNorway/NORDCAN/wiki/nordcan.R-nordcanpreprocessing. @CotterpinDoozer will do this.

HuidongTian commented 1 year ago

The mentioned 4 agr_* related variables were taken out.

https://github.com/CancerRegistryOfNorway/nordcanpreprocessing/blob/master/R/preprocessing_enrichment.R#L251-L259 https://github.com/CancerRegistryOfNorway/nordcanpreprocessing/blob/master/R/preprocessing_enrichment.R#L353

https://github.com/CancerRegistryOfNorway/nordcancore/blob/master/data-raw/nordcan_columns.csv#L58-L60 https://github.com/CancerRegistryOfNorway/nordcancore/blob/master/data-raw/nordcan_columns.csv#L68

CotterpinDoozer commented 1 year ago

I have updated the wiki by removing these variables from the this table: https://github.com/CancerRegistryOfNorway/NORDCAN/wiki/nordcan.R-nordcanpreprocessing#enrichment-done-in-the-package