BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
291 stars 111 forks source link

How to download the TCGA data based on m0 and prior malignancy? #543

Open Pravithaks opened 2 years ago

Pravithaks commented 2 years ago

query_samples = (GDCquery(project = c("TCGA-PRAD"), data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq", workflow.type = "STAR - Counts", sample.type = "Primary Tumor", prior.malignancy = "no", Ajcc.clinical.m = "m0",

access = "open",

                      barcode = barcodes_samples))

I tried with the above code, but it ends with the below error: Error in GDCquery(project = c("TCGA-PRAD"), data.category = "Transcriptome Profiling", : unused arguments (prior.malignancy = "no", Ajcc.clinical.m = "m0")

tiagochst commented 2 years ago

There are no such filters in the package prior.malignancy = "no", Ajcc.clinical.m = "m0". What you can do is either use GDC interface to retrieve the barcode of the samples. Or you can filter after clinical information has been added to the data.

library(SummarizedExperiment)
library(dplyr)
library(TCGAbiolinks)
query_samples <- GDCquery(
    project = c("TCGA-PRAD"),
    data.category = "Transcriptome Profiling",
    data.type = "Gene Expression Quantification",
    experimental.strategy = "RNA-Seq",
    workflow.type = "STAR - Counts",
    sample.type = "Primary Tumor"
)
query_samples$results[[1]] <- query_samples$results[[1]][1:80,]
GDCdownload(query_samples)
se <- GDCprepare(query_samples)
plyr::count(colData(se) %>% as.data.frame(),c("ajcc_clinical_m","prior_malignancy"))
se <- se[,which(se$prior_malignancy == "no" & se$ajcc_clinical_m == "M0")]
Pravithaks commented 2 years ago

renaming the first element in assays to 'counts' Error in DESeqDataSet(pca_processed, design = ~category) : all samples have 0 counts for all genes. check the counting script. I received this error when I ran the below code:

convert our rangesummarizedexperiment dataset rnaseq_pca_processed to a

DESeqDataSet dds = DESeqDataSet(pca_processed, design = ~ category)

On Mon, Sep 19, 2022 at 7:29 PM Tiago Chedraoui Silva < @.***> wrote:

There are no such filters in the package prior.malignancy = "no", Ajcc.clinical.m = "m0". What you can do is either use GDC interface to retrieve the barcode of the samples. Or you can filter after clinical information has been added to the data.

library(SummarizedExperiment) library(TCGAbiolinks) query_samples <- GDCquery( project = c("TCGA-PRAD"), data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", experimental.strategy = "RNA-Seq", workflow.type = "STAR - Counts", sample.type = "Primary Tumor" ) GDCdownload(query_samples)se <- GDCprepare(query_samples) se <- se[,se$prior_malignancy == "no" & se$ajcc_clinical_m == "m0"]

— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/543#issuecomment-1251059637, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARVKJS6DBHKOR3DFQG5BCNTV7BWTJANCNFSM6AAAAAAQQA23X4 . You are receiving this because you authored the thread.Message ID: @.***>