FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
382 stars 101 forks source link

CpG count query #669

Closed MarkOEze closed 3 weeks ago

MarkOEze commented 4 months ago

Hello,

I have some DNA methylation data, which I prepared using EM-Seq and got around 236 million paired-end reads after sequencing on the Illumina NextSeq platform. I have some concerns regarding the number of CpGs I obtained. It is known that there are 28 million CpGs in the human genome; however, the number of CpGs I got, especially at 1X, 3X, and 5x coverage, suggests that the CpG counts may have been overestimated.

I trimmed my reads using Trimgalore, aligned them and extracted the methylation information using Bismark. Please look at the snippets of my codes and the corresponding results below.

Alignment: bismark --bowtie2 --gzip -N 1 --un --ambiguous --phred33-quals -p 4 --multicore 20 --genome /mnt/hcs/dsm-pathology-chatterjee-genomics/Euan_Rodger/Ref_Genomes/GRCh37 -1 AAFJNNMM5-8861-04-00-01_S1_L001_R1_001_val_1.fq.gz -2 AAFJNNMM5-8861-04-00-01_S1_L001_R2_001_val_2.fq.gz

Methylation extraction: bismark_methylation_extractor --gzip --bedGraph --comprehensive --s --multicore 10 --merge_non_CpG --cutoff 1 -o ./bedgraph_10_2/ AAFJNNMM5-8861-06-00-01_S3_L001_R1_001_val_1_bismark_bt2_pe.bam

N/B: I repeated this for all the samples at 3X, 5X, and 10X coverage (codes not shown).

Sample 1x coverage 3x coverage 5x coverage 10x coverage
S1 48,841,280 29,043,575 12,581,307 650,604
S2 50,761,706 34,860,543 18,166,369 1,472,475
S3 44,959,784 21,303,364 7,247,631 265,220
I also counted the number of CpGs with methylkit using the bam files obtained from the Bismark alignment. Result: Sample 1x coverage 3x coverage 5x coverage 10x coverage
S1 49,783,470 31,603,367 16,688,990 1,992,364
S2 52,039,475 37,017,454 22,484,852 3,749,399
S3 45,501,338 26,431,339 13,237,981 1,623,263

Although the second approach seems to give more reasonable results, I still have serious concerns about the accuracy of the CpG count at 1x, 3x, and 5x coverage.

Sorry about the long text; I wanted to provide as much relevant information as possible. Please let me know if you require more information.

I look forward to hearing from you soon.

Thank you.

Kind regards, Mark.

FelixKrueger commented 4 months ago

Hi Mark,

Forgive me if this a naive suggestion, but could the confusion simple arise from the fact that there are indeed ~28 million CpG dyads in the genome, but the calls reported by Bismark are at single bp. In other words, to account for top and bottom strand CpGs you would need to multiply by 2, so ~56M in total?

CG
GC
aniruddha2 commented 4 months ago

Thanks Felix. Hope you are good. Really long time. :-). Mark is doing PhD in my lab. I am wondering what is the best way you will suggest to utilise methylation extractor (or other tools) to obtain CpG dyads only (i.e, 28 million, not 56 million) with coverage info. Thanks, Aniruddha

FelixKrueger commented 4 months ago

I'm good, thanks! You can continue to the bedGraph/coverage step, either by using --bedGraph from within the methylation extractor, or run bismark2bedGraph afterwards. Next, you can run coverage2cytosine with the coverage file as input, and selecting --merge_CpG. This should do exactly what you need.

aniruddha2 commented 4 months ago

Thanks a lot Felix. Much appreciated.

Kind regards, Aniruddha

From: Felix Krueger @.> Date: Tuesday, 7 May 2024 at 9:54 PM To: FelixKrueger/Bismark @.> Cc: Aniruddha Chatterjee @.>, Comment @.> Subject: Re: [FelixKrueger/Bismark] CpG count query (Issue #669)

I'm good, thanks! You can continue to the bedGraph/coverage step, either by using --bedGraph from within the methylation extractor, or run bismark2bedGraph afterwards. Next, you can run coverage2cytosine with the coverage file as input, and selecting --merge_CpG. This should do exactly what you need.

— Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/669#issuecomment-2097907303, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZLES24HR6P7VIONDWA4KLZBCQDRAVCNFSM6AAAAABHHRAX5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXHEYDOMZQGM. You are receiving this because you commented.Message ID: @.***>

aniruddha2 commented 3 weeks ago

Dear Felix,

Hope you are good. More recently, I we are trying to assess the two modes of alignment in bismark for cell free DNA methylation data from RRBS. Standard vs Local.

In Local, we get more CpGs (with coverage filter of 10). However, we also get increase % of methylation in unknown context. I have attached some analysis we did on few samples (attached)

In general I wanted to know your thoughts in both. Do you think Standard is more accurate in terms of methylation calls/reproducibility ? or it provides same results.

I would be really keen to hear your thoughts.

Kind regards, Aniruddha

From: Aniruddha Chatterjee @.> Date: Friday, 10 May 2024 at 6:06 PM To: FelixKrueger/Bismark @.>, FelixKrueger/Bismark @.> Cc: Comment @.> Subject: Re: [FelixKrueger/Bismark] CpG count query (Issue #669) Thanks a lot Felix. Much appreciated.

Kind regards, Aniruddha

From: Felix Krueger @.> Date: Tuesday, 7 May 2024 at 9:54 PM To: FelixKrueger/Bismark @.> Cc: Aniruddha Chatterjee @.>, Comment @.> Subject: Re: [FelixKrueger/Bismark] CpG count query (Issue #669)

I'm good, thanks! You can continue to the bedGraph/coverage step, either by using --bedGraph from within the methylation extractor, or run bismark2bedGraph afterwards. Next, you can run coverage2cytosine with the coverage file as input, and selecting --merge_CpG. This should do exactly what you need.

— Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/669#issuecomment-2097907303, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZLES24HR6P7VIONDWA4KLZBCQDRAVCNFSM6AAAAABHHRAX5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXHEYDOMZQGM. You are receiving this because you commented.Message ID: @.***>

FelixKrueger commented 3 weeks ago

I added a comment to this for issue #317.