BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
295 stars 112 forks source link

does OS and PFS share the same vital status in TCGA data #347

Open wentgithub opened 5 years ago

wentgithub commented 5 years ago

here I list all the columns in clinical data, and I want to do OS and PFS, but I just can find 1 column about the patient status, so does OS and PFS share the same vital status in TCGA data?

bcr_patient_barcode

additional_studies tumor_tissue_site histological_type gender vital_status days_to_birth days_to_death days_to_last_followup race_list tissue_source_site patient_id bcr_patient_uuid informed_consent_verified icd_o_3_site icd_o_3_histology icd_10 tissue_prospective_collection_indicator tissue_retrospective_collection_indicator days_to_initial_pathologic_diagnosis age_at_initial_pathologic_diagnosis year_of_initial_pathologic_diagnosis person_neoplasm_cancer_status day_of_form_completion month_of_form_completion year_of_form_completion ethnicity other_dx history_of_neoadjuvant_treatment karnofsky_performance_score eastern_cancer_oncology_group performance_status_scale_timing neoplasm_histologic_grade residual_tumor tumor_residual_disease jewish_origin anatomic_neoplasm_subdivision initial_pathologic_diagnosis_method init_pathology_dx_method_other venous_invasion lymphatic_invasion radiation_therapy postoperative_rx_tx primary_therapy_outcome_success has_new_tumor_events_information has_drugs_information has_radiations_information has_follow_ups_information project stage_event_system_version stage_event_clinical_stage stage_event_pathologic_stage stage_event_tnm_categories stage_event_psa stage_event_gleason_grading stage_event_ann_arbor stage_event_serum_markers stage_event_igcccg_stage stage_event_masaoka_stage

wentgithub commented 5 years ago

have you have time to check this question, thanks a lot

tiagochst commented 5 years ago

I am not expert on the survival data.

Did you check the last TCGA paper that used the survival data? https://doi.org/10.1016/j.cell.2018.02.052 maybe the authors might help.

wentgithub commented 5 years ago

Thanks a lot@tiagochst I have seen that paper, but not clear yet, to put it in another way, I use tcgabiolinks, I want to to keep the data right, can you check it with other specialist of your team in this field,

tiagochst commented 5 years ago

Here is the definition from the paper (from the S1):

Data Columns:  
Original Clinical Data  
  type: cancer type such as brca, ov, blca, skcm, gbm, and so on.
  10 features from the main files: "age_at_initial_pathologic_diagnosis",  "gender", "race", "ajcc_pathologic_tumor_stage", "clinical_stage", "histological_type", "histological_grade", "initial_pathologic_dx_year", "menopause_status", "birth_days_to" ;  The values in  "clinical_stage" for "THYM" are its "masaoka_stage" values.
Updated Clinical Data from follow-up files   vital_status: the latest updated vital status from follow-up data.   tumor_status: the latest updated tumor status from follow-up data.   last_contact_days_to: the latest "last_contact_days_to" updated from the follow-up data files.   death_days_to: the available "death_days_to" from the follow-up data files.   cause_of_death: the available "cause_of_death" from the follow-up data files.   new_tumor_event_type, new_tumor_event_site, new_tumor_event_site_other are the values corresponding to new_tumor_event_dx_days_to   new_tumor_event_dx_days_to: smallest days from all new_tumor_event_dx_days_to in follow-up files   treatment_outcome_first_course (for deriving DFI): the available data in the field of "treatment_outcome_first_course" from the main and follow-up files.   residual_tumor (for deriving DFI): 5 diseases did not have "treament_outcome_first_course" but had "residual_tumor" from the main file. These 5 diseases were CHOL ,LIHC, MESO, SARC, and THCA.   margin_status (for deriving DFI): BRCA did not have "treatment_outcome_first_course" nor "residual_tumor" but had "margin_status". SARC also had this field but its field of "residual_tumor" was used. Derived Clinical Data     OS: overall survival event, 1 for death from any cause, 0 for alive.   OS.time: overall survival time in days, last_contact_days_to or death_days_to, whichever is larger.   DSS: disease-specific survival event, 1 for patient whose vital_status was Dead and tumor_status was WITH TUMOR. If a patient died from the disease shown in field of cause_of_death, the status of DSS would be 1 for the patient.  0 for patient whose vital_status was Alive or whose vital_status was Dead and tumor_status was  TUMOR FREE. This is not a 100% accurate definition but is the best we could do with this dataset. Technically a patient could be with tumor but died of a car accident and therefore incorrectly considered as an event.   DSS.time: disease-specific survival time in days, last_contact_days_to or death_days_to, whichever is larger.   DFI: disease-free interval event, 1 for patient having new tumor event whether it is a local recurrence, distant metastasis, new primary tumor of the cancer, including cases with a new tumor event whose type is N/A.  Disease free was defined by: first, treatment_outcome_first_course is "Complete Remission/Response"; if the tumor type doesn't have "treatment_outcome_first_course" then disease-free was defined by the value "R0" in the field of "residual_tumor"; otherwise, disease-free was defined by the value "negative" in the field of "margin_status". If the tumor type did not have any of these fields, then its DFI was NA.     0 for censored otherwise. New primary tumor in other organ was censored; patients who were Dead with tumor without new tumor event are excluded; patients wih stage IV are excluded too.   DFI.time: disease-free interval time in days, new_tumor_event_dx_days_to for events, or for censored cases, either last_contact_days_to or death_days_to, whichever is applicable.   PFI: progression-free interval event, 1 for patient having new tumor event whether it was a progression of disease, local recurrence, distant metastasis, new primary tumors all sites , or died with the cancer without new tumor event, including cases with a new tumor event whose type is N/A.     0 for censored otherwise.   PFI.time: progression-free interval time in days, for events, either new_tumor_event_dx_days_to or death_days_to,  whichever is applicable; or for censored cases, either last_contact_days_to or death_days_to, whichever is applicable. Other       Redaction: to show if the case is redacted.

Source:

MLA | Liu, Jianfang, et al. "An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics." Cell 173.2 (2018): 400-416.

wentgithub commented 5 years ago

thanks a lot, the question is that I can not find the corresponding column in the column data as the Table S1 shows,

tiagochst commented 5 years ago

You can check with the code below.

features <-
    c(  "age_at_initial_pathologic_diagnosis",
        "gender",
        "race",
        "ajcc_pathologic_tumor_stage",
        "clinical_stage",
        "histological_type",
        "histological_grade",
        "initial_pathologic_dx_year",
        "menopause_status",
        "birth_days_to",
        "last_contact_days_to",
        "death_days_to",
        "new_tumor_event_dx_days_to"
    )

query <- GDCquery(project = "TCGA-OV",
                  data.category = "Clinical",
                  data.type = "Clinical Supplement",
                  data.format = "BCR Biotab")
GDCdownload(query)
clinical.BCRtab.all <- GDCprepare(query)
names(clinical.BCRtab.all)

found <- rep(FALSE,length(features))
for(n in names(clinical.BCRtab.all)){
    idx <- features %in% colnames(clinical.BCRtab.all[[n]])
    if(any(idx)) {
        found <- found | idx
        message("----------------------------------------")
        message("The following featues can be found in table: ", n)
        message(paste0("- ",paste(features[idx],collapse = "\n- ")))
    }
}
features[!found]
Screen Shot 2019-09-05 at 9 01 20 AM
wentgithub commented 5 years ago

thanks a lot, still not solving my problem