BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

Error in `levels<-`(`*tmp*`, value = as.character(levels)) : factor level [593] is duplicated #530

Open Pravithaks opened 2 years ago

Pravithaks commented 2 years ago

Hello Team,

I recieved this error when I tried to run the GDCprepare. Please let me know how to rectify this.

tiagochst commented 2 years ago

@Pravithaks Please, could you post the query code ?

Pravithaks commented 2 years ago
query_samples <- GDCquery(
   project = c("TCGA-KIRC","CPTAC-3"),
   data.category = "Transcriptome Profiling",
   data.type = "Gene Expression Quantification",
   experimental.strategy = "RNA-Seq",
   workflow.type = "STAR - Counts",
   barcode = barcodes_samples)
)

GDCdownload(query_samples)
rcc = GDCprepare(query_samples)

After I ran above last code(rcc). I received this error.

tiagochst commented 2 years ago

Hi,

For the moment, you will need to prepare those two projects separately.

query_samples <- GDCquery( project = c("TCGA-KIRC"), data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq", workflow.type = "STAR - Counts" ) query <- query_samples GDCdownload(query,files.per.chunk = 100) rcc1 <- GDCprepare(query)

query_samples <- GDCquery( project = c("CPTAC-3"), data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq", workflow.type = "STAR - Counts" ) query <- query_samples GDCdownload(query,files.per.chunk = 100) rcc2 <- GDCprepare(query)

colnames(colData(rcc1)) colnames(colData(rcc2))

shared_metadata <- intersect(colnames(colData(rcc1)),colnames(colData(rcc2))) colData(rcc1) <- colData(rcc1)[,shared_metadata] colData(rcc2) <- colData(rcc2)[,shared_metadata]

rcc <- cbind(rcc2,rcc1)