BSSeeker / BSseeker2

A versatile aligning pipeline for bisulfite sequencing data
http://pellegrini.mcdb.ucla.edu/BS_Seeker2/
MIT License
60 stars 25 forks source link

Only first three bases of aligned reads being used for methylation calling #25

Closed rjcorb closed 1 year ago

rjcorb commented 5 years ago

Hello,

I am using BSseeker2 to analyze single-end RRBS data from chickens. I have aligned the reads using bs_seeker2-align.py with very high unique alignment rates; however, I noticed a smaller-than-expected number of cytosines with good coverage (-r 10) following running bs_seeker2-call_methylation.py. In looking at the cytosines that were included in the CGmap file, I noticed that these were always in the first three positions of aligned reads. No cytosines at later positions in reads were being included in methylation reports despite having sufficient coverage.

I am at a loss for why this is happening and would appreciate any insight. I am using the following software versions:

BSseeker2 1.8 Bowtie2 2.2.3 Python 2.7.15 Pysam 0.15.2

guoweilong commented 5 years ago

hi @rjcorb what is the average coverage of your data? Maybe 10X converage is too high. Usually, for CpG methylaiton , 4X or 5X will be enough. As RRBS reads alway start from c-CGG, then it would be easier for the first three bases to pass the high threshold.

Another, you can also check with our new tool CGmapTools, and the command cgmaptools convert bam2cgmap would be faster for converting the BAM file to CGmap file.

Best, Weilong

rjcorb commented 5 years ago

Hi Weilong,

Thank you for you reply. My problem may have been with the version of samtools I was using; after switching to Samtools 1.9, I used CGmapTools convert and obtained ~10x as many CpG sites with >=10X coverage or greater. I will see if this switch also improves high coverage CpG sites obtained with bs_seeker2-call_methylation.py.

Thanks again,

Ryan

Surajuvm commented 5 years ago

hi @rjcorb what is the average coverage of your data? Maybe 10X converage is too high. Usually, for CpG methylaiton , 4X or 5X will be enough. As RRBS reads alway start from c-CGG, then it would be easier for the first three bases to pass the high threshold.

Another, you can also check with our new tool CGmapTools, and the command cgmaptools convert bam2cgmap would be faster for converting the BAM file to CGmap file.

Best, Weilong

Hi Weilong,

I had an understanding that the -r parameter in bsseeker2 methylation calling affects only the wig files. Does it affect CGmap and ATCGmap files too? I have run bs_seeker2-call_methylation.py with -r values 1 (default) and 10 in the same data. I observed differences only in the wig files. Please let me know if I am missing something here.

Thanks, Suraj

guoweilong commented 5 years ago

Hi Suraj, @Surajuvm

"bs_seeker2-call_methylation.py -r" will affect all ATCGmap, CGmap and wig files. You can try to use low standard to generate CGmap files and ATCGmap files. Then you can use "cgmaptools convert cgmap2wig" to generate a high-standard wig by yourself.

Best, Weilong

Surajuvm commented 5 years ago

Hello Weilong,

I am confused here. In the documentation of bsseeker2, it says that -r READ_NO, --read-no=READ_NO The least number of reads covering one site to be shown in wig file [Default: 1]

Like I said earlier, I ran bs_seeker2-call_methylation.py with -r 10 and -r1 for the same bam file. I did not find any differences in ATCGmap and CGmap files obtained with -r10 and -r1. Their wig files were different.

guoweilong commented 5 years ago

@Surajuvm Thanks. I just check the code. Yes, you are right. The "-r" will only affect wig files not CGmap or ATCGmap files.

Best, Weilong