BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
284 stars 109 forks source link

Error in GDCprepare corresponding to TARGET-AML dataset #584

Closed gourab4gd closed 1 year ago

gourab4gd commented 1 year ago

After taking first 200 tumor samples from TARGET-AML data following Error occurs: Starting to add information to samples Adding description to TARGET samples => Add clinical information to samples Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length In addition: Warning message: Expected 5 pieces. Additional pieces discarded in 5 rows [58, 75, 78, 91, 96].

Following is my full code:

library(TCGAbiolinks) library(SummarizedExperiment) library(survminer) library(survival) library(tidyverse) library(DESeq2)

clinical_data <- GDCquery_clinic('TARGET-AML')

clinical_data$deceased <- ifelse(clinical_data$vital_status == "Alive", FALSE, TRUE)

clinical_data$overall_survival <- ifelse(clinical_data$vital_status == "Alive", clinical_data$days_to_last_follow_up, clinical_data$days_to_death)

query_data_all = GDCquery( project = "TARGET-AML", data.category = "Transcriptome Profiling", # parameter enforced by GDCquery experimental.strategy = "RNA-Seq", access = "open")

output_data <- getResults(query_data_all)

tumor <- output_data$cases[1:200]

query_data <- GDCquery( project = "TARGET-AML", data.category = "Transcriptome Profiling", experimental.strategy = "RNA-Seq", workflow.type = "STAR - Counts", data.type = "Gene Expression Quantification", access = "open", barcode = tumor)

GDCdownload(query_data)

query_data$results[[1]]=query_data$results[[1]][which(!duplicated(query_data$results[[1]]$cases)),]

tcga_data_data <- GDCprepare(query_data, summarizedExperiment = TRUE)

Running the GDCprepare above error took place.

tiagochst commented 1 year ago

@gourab4gd Which TCGAbiolinks version you are using, please ?

gourab4gd commented 1 year ago

Thank you so much Sir for the reply:

TCGAbiolinks version: 2.29.3 TCGAbiolinksGUI.data: 1.15.1

If I try to perform the above operations using first 50 samples instead of 200, it is executing but enhancing the sample number to 200 above error is generating.

tiagochst commented 1 year ago

Thanks. It seems there are these 5 samples that does not follow the expected barcode. I need to understand these samples more before I fix it.

Screenshot 2023-06-05 at 1 03 21 PM
tiagochst commented 1 year ago

@gourab4gd Please, could you update from github and try with the latest version: TCGAbiolinks version: 2.29.6

gourab4gd commented 1 year ago

Thank you so much sir @tiagochst for solving my issue. After installing TCGAbiolinks version: 2.29.6, First 200 samples of TARGET-AML dataset is running and providing me the survival plot.