BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
298 stars 112 forks source link

Cquery did not return "Blood Derived Normal" RNA-seq data that are available at TCGA #404

Open netphantom opened 4 years ago

netphantom commented 4 years ago

Hi there, I'd like to know why, for some projects (such as TCGA-DLBC), the query returns only "primary tumor" sample types, despite on TCGA it is also present "blood derived normal". I notice that this happens also on many other datasets.

I don't know if this is a bug, but I post here the query:

query <- GDCquery(project = "TCGA-DLBC,
                      legacy = TRUE,
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq",
                      file.type = "results",
                      experimental.strategy = "RNA-Seq",
                      sample.type = c("Primary Tumor","Blood Derived Normal"))

I'm using R 4.0, and TCGABiolinks 2.16.

Thanks!

Puriney commented 4 years ago
query.exp <- GDCquery(
  project = "TCGA-BRCA", 
  legacy = FALSE,
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification",
  experimental.strategy = "RNA-Seq",
  workflow.type = 'HTSeq - FPKM-UQ',
  sample.type = c("Blood Derived Normal"))

But in the TCGA website, I can find 992 blood-derived normal cases, 1077 primary tumor, and 163 solid tissue normal.

Update The reason is here the BRCA just does not have blood derived normal RNA-seq. I zoomed in the TCGA website to check some case. The RNA-seq is just primary tumor/solid normal. The filter at TCGA is confusing to some extent.

Puriney commented 4 years ago

@netphantom Could you change the issue title to sth like GDCquery did not return "Blood Derived Normal" RNA-seq data that are available at TCGA so that the author instantly knows what we are asking.

netphantom commented 4 years ago

Hi @Puriney You're right, on some dataset GDCQuery returns correctly some values, while on others no. BRCA is one of the most complete, so I don't have problems with that. I was wondering why (in particular with blood datasets) sometimes it doesn't work.

OliverBit commented 4 years ago

I have the same problem for the LAML datasets. Have you found a way to solve this? It seems the same problem is also true for TARGET project

netphantom commented 4 years ago

for some projects, such as LAML, you can download partially the data from GDCQuery and the other part from Recount2, then "join" on the patients barcodes... at least that's what I'm doing