fiberseq / fibertools-rs

Tools for fiberseq data written in rust.
https://fiberseq.github.io/fibertools/fibertools.html
42 stars 5 forks source link

ft center m6A methylation on non A sites #22

Closed rlegendre closed 1 year ago

rlegendre commented 1 year ago

Dear Mitchell,

After running the fiberseq-smk pipeline, I use ft center on my BAM to generate a metaprofile of m6A methylation rates around CTCF sites. However, I have noticed that there are many m6A positions that do not match with any 'A' in the query sequence.

Thanks for your opinion, Rachel

mrvollger commented 1 year ago

Hi @rlegendre,

m6A calls happen on both A and T bp since CCS sequencing happens on both strands of DNA for even a single molecule, and I was able to confirm that every bp with an m6A call is an A or T. Perhaps you were thinking that T shouldn't be possible?

Here is my test code:

import pandas as pd

df = pd.read_csv("~/Downloads/ccs740_test/test_ccs740_center.txt", sep="\t")

positions = [int(x) for x in df["centered_m6a_positions"][0].split(",") if x != ""]
seq = df["query_sequence"][0]
centered_query_start = df["centered_query_start"][0]

for pos in positions:
    print_pos = pos - centered_query_start
    bp = seq[print_pos : print_pos + 1]
    if bp == "A":
        # this is fine
        continue
    elif bp == "T":
        # CSS reads both strands of DNA so this is fine.
        continue
    else:
        raise ValueError("This is not an A or a T at an m6A position!")
rlegendre commented 1 year ago

Indeed, I had never considered that T sites could also be methylated. With this new information, my metaprofile should be fine! Thank you for your assistance, and I apologize for any inconvenience.