BSSeeker / BSseeker2

A versatile aligning pipeline for bisulfite sequencing data
http://pellegrini.mcdb.ucla.edu/BS_Seeker2/
MIT License
60 stars 25 forks source link

Excessive coverage in first 7000 positions in CGmap file #26

Closed Surajuvm closed 5 years ago

Surajuvm commented 5 years ago

Hello Weilong,

I have been using bsseeker2 to analyze my wgbs reads. My CGmap file (output of bs_seeker2-call_methylation.py) looks like this

NC_006853_1 C 94 CG CG 0.0 3 1279 NC_006853_1 G 95 CG CG 0.01 11 1874 NC_006853_1 C 97 CHH CA 0.01 8 1300 NC_006853_1 G 101 CHH CT 0.01 16 1899 NC_006853_1 G 102 CHH CC 0.01 16 1907 NC_006853_1 C 103 CHH CC 0.01 9 1346

The number on the last column is quite large and this pattern is present in around 7000 positions. Further down, my file looks something like this

NC_037328_1 G 32024 CHG CA 0.08 1 13 NC_037328_1 G 32030 CHH CA 0.0 0 13 NC_037328_1 G 32035 CHG CA 0.0 0 13 NC_037328_1 G 32043 CHH CA 0.0 0 12 NC_037328_1 G 32047 CHH CA 0.0 0 13 NC_037328_1 G 32051 CG CG 0.92 12 13 NC_037328_1 G 32058 CHG CA 0.0 0 10 NC_037328_1 C 32136 CHG CT 0.0 0 10 NC_037328_1 C 32141 CHH CA 0.0 0 10 NC_037328_1 C 32145 CHG CA 0.0 0 10

These values in the last column look more real. Is this a normal thing? Could the first 7000 positions reflect adaptors which missed trimmed? I performed trimming by trimgalore and cutadapt; and verified them by FastQC.

I would be grateful if you can provide me some insight on this issue.

Thank you, Suraj

Surajuvm commented 5 years ago

I just realized that the ones with excessive coverage are mitochondrial reads.

guoweilong commented 5 years ago

Sure. mitochondrial sequences could get such high coverage.

Best, Weilong