Aparajita-K / CoALa

Multi-view data integration using approximate graph Laplacians
4 stars 1 forks source link

Omics data sets problem #1

Closed abc687 closed 2 years ago

abc687 commented 2 years ago

How to select Omics data sets on TCGA?How to preprocess data sets?

Aparajita-K commented 2 years ago

To browse through the available omics data sets in TCGA, you can look at their GDC data portal at https://portal.gdc.cancer.gov/ and select your cancer multi-omics data set of interest from 33 TCGA projects. Within each project you can select one or more available omics modalities like gene/mRNA expression, DNA methylation, copy number variation and so on. You can download the raw or normalized expression data sets from the GDC portal itself. There are also several other options to download TCGA data in pre-processed [sample x genomic_feature] format. One option is the UCSC Xena browser https://xenabrowser.net/datapages/, and another is through the TCGA Biolinks R package https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html. It has a very well documented vignette https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/download_prepare.html that you can follow step by step to prepare and download the pre-processed TCGA data of interest. Hope it helps.