d3b-center / hope-cohort-analysis

Analysis for HOPE cohort
3 stars 1 forks source link

Harmonize HOPE_diagnosis_type in histologies file #78

Closed komalsrathi closed 9 months ago

komalsrathi commented 11 months ago

The HOPE_diagnosis_type column has some redundancy. Can someone fix it in the histology file. Currently I am doing the following but I am not sure if that's the correct way to do it so open to suggestions:

# read histologies
> annot <- read_tsv(file.path(data_dir, "Hope-GBM-histologies.tsv"))

# current HOPE_diagnosis_type values
> annot$HOPE_diagnosis_type %>% unique()
[1] "Initial CNS Tumor"   "Second Malignancy"   NA                    "Progressive"        
[5] "Recurrence"          "Primary"             "recurrent"           "Recurrent, residual"

Reduce HOPE_diagnosis_type values:

> annot$HOPE_diagnosis_type[annot$HOPE_diagnosis_type == "Recurrent"] = "Recurrence"
> annot$HOPE_diagnosis_type[annot$HOPE_diagnosis_type == "recurrent"] = "Recurrence"
> annot$HOPE_diagnosis_type[annot$HOPE_diagnosis_type == "Recurrent, residual"] = "Recurrence"
> annot$HOPE_diagnosis_type[annot$HOPE_diagnosis_type == "Primary"] = "Initial CNS Tumor"

> annot$HOPE_diagnosis_type %>% unique()
[1] "Initial CNS Tumor" "Second Malignancy" NA                  "Progressive"      
[5] "Recurrence"       
jharenza commented 10 months ago

@komalsrathi can you use tumor_descriptor instead? Not sure who created that HOPE_diagnosis_type field