Closed PeteHaitch closed 8 years ago
BSmooth
These notes are copied from the now-closed https://github.com/PeteHaitch/comethylation/issues/19
SAM/BAM
files created by BSmooth
contain the read sequence in the YO
tag rather than in the SEQ
field. In fact, the SEQ
field contains the C-to-T converted sequence.
More generally, it appears the SAM/BAM
files created by bsmooth
may not comply with SAM specifications.
When using the --fr
flag to indicate that reads should be in FR orientation (the default for Illumina's paired-end sequencing protocol) the FLAG
is set to either 67 or 131. This is not what I expect. I would expect 99 or 147. Need to check my reasoning by eyeballing some reads in IGV.
These notes are copied from the now-closed https://github.com/PeteHaitch/comethylation/issues/30
The XM
-tag should be of the same length as SEQ
(and QUAL
). The SEQ
field includes soft-clipped bases and so the corresponding entry of the XM
-tag should be .
. For inserted bases, the corresponding entry of the XM
-tag should also be .
because we can't infer it from the reference sequence alone. Technically, we could try to infer this from the read sequence, e.g. a C insertion followed by a G reference match could be evidence of a methylated CG, however, Bismark sets the XM
-tag entry to .
(based on setting the corresponding genomic sequence to X
, which ensures no match between the read and the "reference" at an insertion).
These notes are copied from the now-closed https://github.com/PeteHaitch/comethylation/issues/73
Each read will need an XM
-tag, XR
-tag and XG
-tag. The XM
-tag must be as long as the SEQ
field. Therefore, soft-clipped positions, as identified by the CIGAR
string, should have the corresponding entry in the XM
-tag set to .
. Bases that are hard-clipped should never appear in the SEQ
field.
May need to check the orientation of paired-end reads because Bismark only allows FR
orientation (and also permits for a certain amount of overlap), e.g..:
----->
<----
Other aligners might allow FF
, RF
or RR
orientation; will these need to be handled separately?
Bismark remains the dominant aligner in the field. Supporting other aligners is a low priority for me, but pull requests are welcome.
comethylation
only supports reads aligned with Bismark. It should be relatively straightforward to support other aligners but I won't be doing this until I have motivation to do so, i.e. I need to analyse data aligned with another aligner or someone usingcomethylation
requests support. Pull requests adding support for additional aligners are most welcome.Other aligners should be supported via a command line option
--aligner
flag, e.g.--aligner bwa-meth
. I prefer this approach to my previous idea of having a separate script to "Bismark-ify" the SAM/BAM file.Aligners I would like to support:
BSMAP
BSmooth
bwa-meth
gsnap
last
Novoalign