Closed mtcicero26 closed 1 week ago
--m6a already reports the reference coordinates which can be changed with the --molecular flag and then back again with --reference. So I don't understand or am missing something about the request. https://fiberseq.github.io/fibertools/help.html#ft-extract
Sorry, I didn't realize that.
Ok, actually, the issue is that the output for the ref-m6a column in --m6a is inconsistent with --all.
For example: An example read which maps from chr2L:1602-9209
The chromosome, start, and end columns look correct for --r, but it seems like --all adds the start coordinate onto the m6a coordinates. Based on inconsistencies between the coordinates provided by --reference and --molecular, I think it is correctly adjusting the position for gaps etc. in the alignment.
This is fine, assuming the -r coordinates are indeed equivalent to --all ref_m6a outside of the addition of the start coordinate, and I can account for it in FiberHMM when someone uses that as the input. I do think it might be better if the outputs were consistent though?
BED12 format specifies that you use the offset from the start specifically in that column, so to maintain compatibility we cannot change this. The reference offsets plus the start should be the same as the --all table since we do account for gaps in the alignment. If you find this isn't the case anywhere you are welcome to reopen.
Ok, great, that makes sense. I'll account for that in FiberHMM.
It would probably be good if FiberHMM took BED12 since that is a standard format in case you ever want to use m6As from another caller.
Alternatively you may consider using our experimental python api for fibertools, py-ft, that can directly extract this info for you.