BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
284 stars 109 forks source link

Different clinical results when using GDCprepare_clinic (clinical.info = "patient" vs "follow_up") #594

Open buzhizhang121 opened 11 months ago

buzhizhang121 commented 11 months ago

Hollo, When I use GDCprepare_clinic (TCGAbiolinks_2.25.3) to download follow-up data, the results are different. It seems that GDCprepare_clinic(query, clinical.info = "patient") get the earlier version of follow up data.

query <- GDCquery(project = "TCGA-TGCT", data.category = "Clinical", file.type = "xml") GDCdownload(query) ​ cols <- c("bcr_patient_barcode", "vital_status", "days_to_last_followup", "days_to_death") patients <- c("TCGA-2G-AAHL", "TCGA-2G-AAKM") ​

clinical <- GDCprepare_clinic(query, clinical.info = "patient") clinical[clinical$bcr_patient_barcode%in%patients,cols] ​ bcr_patient_barcode vital_status days_to_last_followup days_to_death 56 TCGA-2G-AAHL Alive 1819 NA 64 TCGA-2G-AAKM Alive 2651 NA

clinical2 <- GDCprepare_clinic(query, clinical.info = "follow_up") clinical2[clinical2$bcr_patient_barcode%in%patients,cols] bcr_patient_barcode vital_status days_to_last_followup days_to_death 9 TCGA-2G-AAKM Dead NA 6972 12 TCGA-2G-AAHL Alive 1819 NA 13 TCGA-2G-AAHL Alive 7081 NA

Can you please help to figure out this bug?

Thank you, Yang

pfren1998 commented 10 months ago

In the original clinical data from tcga(which is in xml format), there are two records for one patient, it seems that tcga has stored the updated information in the follow_up slot