Closed MariosEft97 closed 5 months ago
Re the reference coordinates, the following logic is oversimplified for a couple of reasons
list(np.array(read.get_tag('ns')) + read.reference_start
but primarily because a read can align with insertions and deletions which creates an inconsistent offset which ft
accounts for.
Re the molecular coordinates ft extract --all
reports the positions of m6A methylation relative to the sequence reported in ft extract --all
, and this sequence will be reverse complemented when aligned to the reverse strand so as to have everything in reference orientation. However, the tags contain the positions of the methylations in the original read with no consideration to reference (really, the only valid way to store them that can survive realignment). So, I'd expect a difference on reverse-strand-aligned reads and no difference on forward-aligned reads.
If you want a python interface with the results of fibertools I'd recommend: https://py-ft.readthedocs.io/en/latest/index.html though it is still very early days.
Please reopen if you find something that is not addressed here.
Hello! Thank you for developing such an amazing tool!
I noticed that for the same reads, the molecular/reference coordinates for nucleosomes and MSPs in the
ft extract
bed are different from the coordinates returned when retrieving the tags withpysam
.Environment:
Commands used:
Screenshots:
view.bam
can be downloaded with: