Open wentgithub opened 5 years ago
have you have time to check this question, thanks a lot
I am not expert on the survival data.
Did you check the last TCGA paper that used the survival data? https://doi.org/10.1016/j.cell.2018.02.052 maybe the authors might help.
Thanks a lot@tiagochst I have seen that paper, but not clear yet, to put it in another way, I use tcgabiolinks, I want to to keep the data right, can you check it with other specialist of your team in this field,
Here is the definition from the paper (from the S1):
Data Columns: | |
---|---|
Original Clinical Data | |
type: cancer type such as brca, ov, blca, skcm, gbm, and so on. | |
10 features from the main files: "age_at_initial_pathologic_diagnosis", "gender", "race", "ajcc_pathologic_tumor_stage", "clinical_stage", "histological_type", "histological_grade", "initial_pathologic_dx_year", "menopause_status", "birth_days_to" ; The values in "clinical_stage" for "THYM" are its "masaoka_stage" values. |
Updated Clinical Data from follow-up files | vital_status: the latest updated vital status from follow-up data. | tumor_status: the latest updated tumor status from follow-up data. | last_contact_days_to: the latest "last_contact_days_to" updated from the follow-up data files. | death_days_to: the available "death_days_to" from the follow-up data files. | cause_of_death: the available "cause_of_death" from the follow-up data files. | new_tumor_event_type, new_tumor_event_site, new_tumor_event_site_other are the values corresponding to new_tumor_event_dx_days_to | new_tumor_event_dx_days_to: smallest days from all new_tumor_event_dx_days_to in follow-up files | treatment_outcome_first_course (for deriving DFI): the available data in the field of "treatment_outcome_first_course" from the main and follow-up files. | residual_tumor (for deriving DFI): 5 diseases did not have "treament_outcome_first_course" but had "residual_tumor" from the main file. These 5 diseases were CHOL ,LIHC, MESO, SARC, and THCA. | margin_status (for deriving DFI): BRCA did not have "treatment_outcome_first_course" nor "residual_tumor" but had "margin_status". SARC also had this field but its field of "residual_tumor" was used. Derived Clinical Data | OS: overall survival event, 1 for death from any cause, 0 for alive. | OS.time: overall survival time in days, last_contact_days_to or death_days_to, whichever is larger. | DSS: disease-specific survival event, 1 for patient whose vital_status was Dead and tumor_status was WITH TUMOR. If a patient died from the disease shown in field of cause_of_death, the status of DSS would be 1 for the patient. 0 for patient whose vital_status was Alive or whose vital_status was Dead and tumor_status was TUMOR FREE. This is not a 100% accurate definition but is the best we could do with this dataset. Technically a patient could be with tumor but died of a car accident and therefore incorrectly considered as an event. | DSS.time: disease-specific survival time in days, last_contact_days_to or death_days_to, whichever is larger. | DFI: disease-free interval event, 1 for patient having new tumor event whether it is a local recurrence, distant metastasis, new primary tumor of the cancer, including cases with a new tumor event whose type is N/A. Disease free was defined by: first, treatment_outcome_first_course is "Complete Remission/Response"; if the tumor type doesn't have "treatment_outcome_first_course" then disease-free was defined by the value "R0" in the field of "residual_tumor"; otherwise, disease-free was defined by the value "negative" in the field of "margin_status". If the tumor type did not have any of these fields, then its DFI was NA. | 0 for censored otherwise. New primary tumor in other organ was censored; patients who were Dead with tumor without new tumor event are excluded; patients wih stage IV are excluded too. | DFI.time: disease-free interval time in days, new_tumor_event_dx_days_to for events, or for censored cases, either last_contact_days_to or death_days_to, whichever is applicable. | PFI: progression-free interval event, 1 for patient having new tumor event whether it was a progression of disease, local recurrence, distant metastasis, new primary tumors all sites , or died with the cancer without new tumor event, including cases with a new tumor event whose type is N/A. | 0 for censored otherwise. | PFI.time: progression-free interval time in days, for events, either new_tumor_event_dx_days_to or death_days_to, whichever is applicable; or for censored cases, either last_contact_days_to or death_days_to, whichever is applicable. Other | Redaction: to show if the case is redacted. |
---|
Source:
MLA | Liu, Jianfang, et al. "An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics." Cell 173.2 (2018): 400-416.
thanks a lot, the question is that I can not find the corresponding column in the column data as the Table S1 shows,
You can check with the code below.
features <-
c( "age_at_initial_pathologic_diagnosis",
"gender",
"race",
"ajcc_pathologic_tumor_stage",
"clinical_stage",
"histological_type",
"histological_grade",
"initial_pathologic_dx_year",
"menopause_status",
"birth_days_to",
"last_contact_days_to",
"death_days_to",
"new_tumor_event_dx_days_to"
)
query <- GDCquery(project = "TCGA-OV",
data.category = "Clinical",
data.type = "Clinical Supplement",
data.format = "BCR Biotab")
GDCdownload(query)
clinical.BCRtab.all <- GDCprepare(query)
names(clinical.BCRtab.all)
found <- rep(FALSE,length(features))
for(n in names(clinical.BCRtab.all)){
idx <- features %in% colnames(clinical.BCRtab.all[[n]])
if(any(idx)) {
found <- found | idx
message("----------------------------------------")
message("The following featues can be found in table: ", n)
message(paste0("- ",paste(features[idx],collapse = "\n- ")))
}
}
features[!found]
thanks a lot, still not solving my problem
here I list all the columns in clinical data, and I want to do OS and PFS, but I just can find 1 column about the patient status, so does OS and PFS share the same vital status in TCGA data?
bcr_patient_barcode
additional_studies tumor_tissue_site histological_type gender vital_status days_to_birth days_to_death days_to_last_followup race_list tissue_source_site patient_id bcr_patient_uuid informed_consent_verified icd_o_3_site icd_o_3_histology icd_10 tissue_prospective_collection_indicator tissue_retrospective_collection_indicator days_to_initial_pathologic_diagnosis age_at_initial_pathologic_diagnosis year_of_initial_pathologic_diagnosis person_neoplasm_cancer_status day_of_form_completion month_of_form_completion year_of_form_completion ethnicity other_dx history_of_neoadjuvant_treatment karnofsky_performance_score eastern_cancer_oncology_group performance_status_scale_timing neoplasm_histologic_grade residual_tumor tumor_residual_disease jewish_origin anatomic_neoplasm_subdivision initial_pathologic_diagnosis_method init_pathology_dx_method_other venous_invasion lymphatic_invasion radiation_therapy postoperative_rx_tx primary_therapy_outcome_success has_new_tumor_events_information has_drugs_information has_radiations_information has_follow_ups_information project stage_event_system_version stage_event_clinical_stage stage_event_pathologic_stage stage_event_tnm_categories stage_event_psa stage_event_gleason_grading stage_event_ann_arbor stage_event_serum_markers stage_event_igcccg_stage stage_event_masaoka_stage