BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
291 stars 111 forks source link

How to deal with the duplicated samples in TARGET-AML? #556

Open xiaolan552 opened 1 year ago

xiaolan552 commented 1 year ago
Hi @tiagochst when I use the TCGAbiolinks to download the TARGET-AML project,I found some samples duplicated,such as: ` cases experimental_strategy analysis_workflow_type
101 TARGET-20-PADYIR-09A-01R RNA-Seq STAR - Counts
377 TARGET-20-PADYIR-09A-01R RNA-Seq STAR - Counts
988 TARGET-20-PAECCE-09A-01R RNA-Seq STAR - Counts
1012 TARGET-20-PAECCE-09A-01R RNA-Seq STAR - Counts
1105 TARGET-20-PAEERJ-09A-01R RNA-Seq STAR - Counts
1107 TARGET-20-PAEERJ-09A-01R RNA-Seq STAR - Counts
19 TARGET-20-PAKIWK-09A-01R RNA-Seq STAR - Counts
909 TARGET-20-PAKIWK-09A-01R RNA-Seq STAR - Counts
323 TARGET-20-PAKVGI-09A-01R RNA-Seq STAR - Counts
951 TARGET-20-PAKVGI-09A-01R RNA-Seq STAR - Counts `

I tried to remove the duplicate sample, but when I was about to merge the data,there hava a new error Error in.rowNamesDF<-(x, value = value) : invalid 'row.names' length

This is my code: query.exp <- GDCquery( project = “TARGET-AML”, data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", ) query.exp$results[[1]] <- query.exp$results[[1]][!duplicated(query.exp$results[[1]]$sample.submitter_id),] GDCdownload(query.exp) gene.data <- GDCprepare(query = query.exp)

The TCGAbiolinks package version is 2.25.3,R version is 4.2.0.

Any help would be appreciated ! thank you !