Hi @tiagochst
when I use the TCGAbiolinks to download the TARGET-AML project,I found some samples duplicated,such as:
`
cases
experimental_strategy
analysis_workflow_type
101
TARGET-20-PADYIR-09A-01R
RNA-Seq
STAR - Counts
377
TARGET-20-PADYIR-09A-01R
RNA-Seq
STAR - Counts
988
TARGET-20-PAECCE-09A-01R
RNA-Seq
STAR - Counts
1012
TARGET-20-PAECCE-09A-01R
RNA-Seq
STAR - Counts
1105
TARGET-20-PAEERJ-09A-01R
RNA-Seq
STAR - Counts
1107
TARGET-20-PAEERJ-09A-01R
RNA-Seq
STAR - Counts
19
TARGET-20-PAKIWK-09A-01R
RNA-Seq
STAR - Counts
909
TARGET-20-PAKIWK-09A-01R
RNA-Seq
STAR - Counts
323
TARGET-20-PAKVGI-09A-01R
RNA-Seq
STAR - Counts
951
TARGET-20-PAKVGI-09A-01R
RNA-Seq
STAR - Counts
`
I tried to remove the duplicate sample, but when I was about to merge the data,there hava a new error
Error in.rowNamesDF<-(x, value = value) : invalid 'row.names' length
I tried to remove the duplicate sample, but when I was about to merge the data,there hava a new error
Error in
.rowNamesDF<-(x, value = value) : invalid 'row.names' length
This is my code:
query.exp <- GDCquery( project = “TARGET-AML”, data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", ) query.exp$results[[1]] <- query.exp$results[[1]][!duplicated(query.exp$results[[1]]$sample.submitter_id),] GDCdownload(query.exp) gene.data <- GDCprepare(query = query.exp)
The TCGAbiolinks package version is 2.25.3,R version is 4.2.0.
Any help would be appreciated ! thank you !