McMasterAI / projectx-2021

0 stars 0 forks source link

Methylation data #1

Open dufaultc opened 2 years ago

dufaultc commented 2 years ago

From TCGA, this data is our prediction input.

Information Needed

dufaultc commented 2 years ago

Two types of methylation assays for TCGA projects are being used, Illumina Human Methylation 27 and Illumina Human Methylation 450. We are only considering methylation at CPG sites measured with 27 type assay. As all CPG sites found in the 450 type are also in the 27 type, we are able to use these data types as well.

Processed data is currently found in the assay27processed s3 bucket, with a file for each case. Currently this bucket contains only data retrieved from 27 types assays as it is very time consuming processing the 450 type files. Each file is a csv with the first column containing the gene name sand the second column containing a semicolon separated list of the beta values found at CPG sites identified as belonging to the genes. The cancer type of the sample is found in the file name, which corresponds to a TCGA project.

dufaultc commented 2 years ago

The Methylation Pulling notebook found in the notebooks folder contains the code for running the Methylation data pulling.