cwarden45 / COHCAP

R package for DNA Methylation Analysis (and Gene Expression Integration) for Illumina Infinium Array and BS-Seq Data
1 stars 0 forks source link

what mean HCT116_truncated.bed file? #2

Closed YoungYu5 closed 4 years ago

YoungYu5 commented 4 years ago

Hello.

I run COHCAP used manual dataset. As run that data, I run COHCAP.BSSeq.preprocess command. I think, BSseq files is exp file, and HCT116 is control file, is that correct? if HCT116 file is control file, what means each column in HCT116 file? and how can get that file?

Thanks.

cwarden45 commented 4 years ago

Hi,

Those are good questions.

The "truncated" part refers to a much smaller portion of sites, allowing them to be used as a demo dataset.

Everything within this folder should be BS-Seq files:

https://github.com/cwarden45/COHCAP/tree/master/inst/extdata/BSSeq

You can see 2 example expression files in this folder (the files starting with "expression"):

https://github.com/cwarden45/COHCAP/tree/master/inst/extdata

So, if "exp" means "expression", then that part is not correct. All 3 BS-Seq files should be methylation.

For 2 of those samples, you can see the accession for the public data (SRR096437 and SRR096438, which are MCF7 replicates). However, I need to look into where that 3rd file came from (HCT116_truncated.bed). For example, it could really be a public BS-Seq dataset, or it might have been an attempt to convert the 450k array data to a format more similar to BS-Seq (in order to test using methylKit with real BS-Seq data versus "simulated" BS-Seq data, which was part of the original publication).

I will provide another update when I can give a more precise answer. However, I can say that all 3 of those files are supposed to be for DNA Methylation (not gene expression).

Thank You, Charles

cwarden45 commented 4 years ago

I am still trying to see if I can learn a little more about that sample, but I can already tell you a little more about why I have those 3 samples:

The _HCT116truncated.bed file is the demo files for the earlier COHCAP.BSSeq.preprocess() function.

However, I made additional changes to COHCAP after I returned to City of Hope. For example, _SRR096437truncated.bismark.cov and _SRR096438truncated.bismark.cov are the demo files for the COHCAP.BSSeq_V2.methyl.table() function.

I am guessing you probably want to use the 2nd function, if you are working with Bismark output files.

cwarden45 commented 4 years ago

I think the answer above is probably what is most important for COHCAP usage.

However, in terms of the background, I think a lot of files for papers where I was 1st author in 2013 have been deleted (since there wasn't a grant for those projects). Some files are saved within PI folders. However, it is currently hard for me to say for certain whether the HCT116 BS-Seq file is actually from public data or if it is 450k reformatted to provide and example for COHCAP. I think it is the former, since there is only 1 sample (if I reformatted all of the HCT116 samples, then there should be 3 of them). If I can be more confident, I will add another answer. However, I would currently say that I may or may not be able to say for certain whether that is public targeted BS-Seq data.

Either way, for new samples, I think COHCAP.BSSeq_V2.methyl.table() is probably what you want to use, with the other example input files.

YoungYu5 commented 4 years ago

Thanks your advice.

I have one more question. I find DMR used 450k array data and BSseq data. In COHCAP paper, COHCAP possible find DMR both dataset. but manual did not write that issue.

How could find DMR used 450k array and BSseq data?

Thanks.

cwarden45 commented 4 years ago

The methods are the same (using percent methylation for BS-Seq data and beta values for Illumina Infinium methylation arrays), but getting the annotations is more difficult for the BS-Seq data.

It starts to get into something that I have more difficulty supporting (and therefore may or may not be able to resolve for your project), but I have a template for putting together some annotations for RRBS data:

https://github.com/cwarden45/DNAmethylation_templates/tree/master/RRBS_workflow

In other words, once you have all of the necessary files, the analysis strategy is the same:

1) Annotate CpG sites using COHCAP.annotate() 2) Look for deferentially methylated sites using COHCAP.site() 3) Among deferentially methylated sites, look for differentially methylated regions (DMR) using one of the following strategies:

I wish you the best for your project!

YoungYu5 commented 4 years ago

Thanks :)