Closed rlegendre closed 1 year ago
Hi @rlegendre,
m6A calls happen on both A and T bp since CCS sequencing happens on both strands of DNA for even a single molecule, and I was able to confirm that every bp with an m6A call is an A or T. Perhaps you were thinking that T shouldn't be possible?
Here is my test code:
import pandas as pd
df = pd.read_csv("~/Downloads/ccs740_test/test_ccs740_center.txt", sep="\t")
positions = [int(x) for x in df["centered_m6a_positions"][0].split(",") if x != ""]
seq = df["query_sequence"][0]
centered_query_start = df["centered_query_start"][0]
for pos in positions:
print_pos = pos - centered_query_start
bp = seq[print_pos : print_pos + 1]
if bp == "A":
# this is fine
continue
elif bp == "T":
# CSS reads both strands of DNA so this is fine.
continue
else:
raise ValueError("This is not an A or a T at an m6A position!")
Indeed, I had never considered that T sites could also be methylated. With this new information, my metaprofile should be fine! Thank you for your assistance, and I apologize for any inconvenience.
Dear Mitchell,
After running the fiberseq-smk pipeline, I use ft center on my BAM to generate a metaprofile of m6A methylation rates around CTCF sites. However, I have noticed that there are many m6A positions that do not match with any 'A' in the query sequence.
ft center test_ccs740.bam CTCF_test.bed -w > test_ccs740_center.txt
Thanks for your opinion, Rachel