cultivarium / MicrobeMod

A toolkit for exploring prokaryotic methylation and base modifications in nanopore sequencing
MIT License
39 stars 1 forks source link

Identification of motifs fails with lower-case fasta reference sequences #29

Open tfwulff opened 2 months ago

tfwulff commented 2 months ago

Thanks for your great tool!

I encountered a problem when using a reference genome fasta file with lower-case alphabet: Although motifs were identified by STREME, no motifs were listed in the final LIBRARY_NAME_motifs.tsv output file. It turned out that STREME transforms input sequences to upper-case letters by default, which makes microbemod.py fail when trying to find motif occurrences in the lower-case reference sequence (lines 335 to 347). May be worth to mention this behaviour in the parameter description, took me some time to figure out what the problem was.

alexcritschristoph commented 1 month ago

Thanks for catching this case! Will try to update to fix it soon.