FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
392 stars 102 forks source link

Issue with bismark_methylation_extractor #619

Closed yingyin2019 closed 1 year ago

yingyin2019 commented 1 year ago

Hi, We aligned paired-end WGBS data (the library was prepared via PBAT method) using bismark with pbat parameter. This gives us one bam file and 2 unmapped fq: unmapped_1 and unmapped_2. then I aligned unmapped_1 with bismark --se and --pbat , unmapped_2 with bismark --se, followed by de-duplication and samtools sort by name. then we are trying to extract the methylation calls from those three sorted bam files using bismark_methylation_extractor . the one from the paired end mapping works fine, but the other two from the unmapped_1 and unmapped_2 both show an error.

The IDs of Read 1 (LH00209:26:22C2MHLT3:6:1101:25854:1128_1:N:0:TAATCG) and Read 2 (LH00209:26:22C2MHLT3:6:1101:24873:1160_1:N:0:TAATCG) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'.

The 1st two reads from unmapped_1 looks like this :

LH00209:26:22C2MHLT3:6:1101:47166:1048_1:N:0:TAATCG 0 chr5 98332078 42 141M 0 0 TATTTAGTGTTTTTGGTTTTATGTGGAGTTTTTTAATTTATTTAGATTTGATGTTTAATATTTTTAATTATTAGGGAAATGTAAATTAAAATAATTTTGAGATTTTATTTTATATTAGTTAGAATGGTTAAGATTAAAAAT FFFFFFFFFFF-FF-FFF-FFF-FFFF-FFF-FFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF-F5FFFFFFF NM:i:32 MD:Z:0C3C5C1C17C0C5C0C1C14C2C2C0C5C2C9C4C4C2C0C0C7C0C1C0C1C1C1C0C3C7C6C6 XM:Z:h...x.....h.x.................hh.....hh.h..............h..h..hh.....h..x.........h....h....h..hhx.......hh.hh.h.h.hx...x.......h......h...... XR:Z:GA XG:Z:CT LH00209:26:22C2MHLT3:6:1101:17356:1112_1:N:0:TAATCG 0 chr8 78748931 42 141M 0 0 TGCATTTTTTTTGGGTTTAATAGTAGTAGTTGTAGGTAAGTCATTTTAGTATACGTTTTTGTTTTATTTTTGATTTTTTTTAGAGTTGTAGAAATATTTGTAGTATTTAAGAGATTTTTTAAAATAAATGAATTATTAAGT FFFFFFFFFFFFFF5FFFFFFFF5FFFFFFFFFFFFFFFFFFFFFFFF-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:39 MD:Z:5C0C0C8C0C2C2C2C2C13C0C0C0C2C1C6C2C1C0C3C0C3C1C3C0C4C2C11C2C2C0C6C2C0C5C7C0C1C0C4 XM:Z:..H..hhh........hh..x..x..x..x...........H.hhhx..h.h.Z....x..h.hh...hx...h.h...hx....x..x...........x..h..hh......h..hh.....h.......hh.hh.... XR:Z:GA XG:Z:CT

Could you please help us to figure out what's wrong with the bam files?

Many thanks in advance!

Ying

yingyin2019 commented 1 year ago

problem solved. wrong parameter was used for bismark_methylation_extractor. single-end should be -s .

FelixKrueger commented 1 year ago

A bit late to the party, sorry. In theory, if you feed the paired-end file to the methylation extractor it should auto-detect the paired-end nature of the file. Similarly, you can feed both single end files at the same time, and it should auto-detect single-end (just don't mix both PE and SE as it will perform the test only once).

All the best, Felix