cultivarium / MicrobeMod

A toolkit for exploring prokaryotic methylation and base modifications in nanopore sequencing
MIT License
34 stars 1 forks source link

All motif locations #18

Closed patrickwest closed 3 months ago

patrickwest commented 5 months ago

Hi,

Is there a way to obtain all the locations of a given motif, regardless of methylation status, from the MicrobeMod output? Or if not, a suggested way to use the streme output from MicrobeMod with meme Suite to get them? I'm interested in motif locations that may not be methylated as well

alexcritschristoph commented 5 months ago

I'm thinking probably not to add this to MicrobeMod at the moment, but it is not hard to do after running it, here is some (untested) code. This is similar to what I do to get the motifs among the methylated positions.

from Bio.SeqUtils import nt_search
from Bio.Seq import Seq
import pandas as pd
from Bio import SeqIO

REF = {}
for record in SeqIO.parse(fasta_file, "fasta"):
    REF[record.id] = record.seq

microbemod_output = pd.read_csv("example_motifs.tsv", sep="\t")

results = []
for motif in microbemod_output.Motif:
  for contig, contig_seq in REF.items():
      for motif_site in nt_search(str(contig_seq), Seq(motif))[1:]:
          results.append({"Motif":motif,"contig":contig,"position"":motif_site})
      for motif_site in nt_search(str(contig_seq), Seq(motif).reverse_complement())[1:]:
          results.append({"Motif":motif,"contig":contig,"position"":motif_site})

results = pd.DataFrame(results)