Closed fkgruber closed 1 year ago
Hi,
Indexed (Vital_status_indexed
) should be the same is the the latest follow up information, in case there is no follow up the initial vital status is used (Vital_status_patient
)
The only concern there is the 3rd row (dead, alive, alive) with 4 samples.
A patient can have multiple follow ups visits, is the alive
matching the latest one ?
Otherwise, we would need to check with GDC. Can you provide the 4 samples IDs ?
Sure here they are:
vital_status_indexed | vital_status_patient | vital_status_followup | N | Pats |
---|---|---|---|---|
Dead | Alive | Alive | 4 | TCGA-FB-A545, TCGA-IB-7891, TCGA-FB-A5VM, TCGA-F2-7273 |
The last follow up for TCGA-FB-A545 is indeed dead in the file nationwidechildrens.org_clinical_follow_up_v4.4_paad.txt
https://portal.gdc.cancer.gov/cases/b0cb81ad-3c20-4d56-ab7d-f64c0caee1ce
Can you send me the code you used to get the file clinic_followup.csv
?
sure
clinic <- GDCquery(project = "TCGA-PAAD", data.category = "Clinical", file.type = "xml")
GDCdownload(clinic, directory = paste0(fileDir, "raw"))
clinic.followup <- GDCprepare_clinic(clinic, "follow_up", directory = paste0(fileDir, "raw"))
Thanks. The clinic.followup data frame has the death information for those 4 cases
Got it thanks
Downloaded the TCGA-PAAD with TCGAbiolinks but found some inconsistencies in the definition of vital status. There are 3 different files contain a vital status information and they don't completely match: clinic_indexed.csv, clinic_followup.csv, and clinic_patient.csv.