biointec / blamm

BLAS Accelerated Motif Matching - Fast software for PWM motif matching
GNU General Public License v3.0
3 stars 2 forks source link

PFMs occurences output incorrect #2

Open SMZmk opened 1 year ago

SMZmk commented 1 year ago

First, thank you very much for this most helpful tool. I encountered a problem where the occurrence of a PFM did not matched the sequence and is listed with distances of 250 nt. This issue happened to on multiple experiments.

blamm dict sequences.mf

group1 /home/.../Solyc02g080300_alin2.fas

blamm hist Sl_SSR_motifs.jaspar sequences.mf

blamm scan -pt 0.001 motifs.jaspar sequences.mf

motifs.jaspar

motif_10 A [ 19 7 23 10 0 62 6 2 48 19 0 52 4 8] C [ 26 216 27 27 638 8 1 331 11 19 740 0 0 260] G [ 67 14 1 116 0 1 346 15 6 81 0 0 457 15] T [ 12 23 29 34 4 28 9 16 34 22 0 35 2 23]

Solyc02g080300.fas

Solyc02g080300.BGV006775.3 atagctacttaaattcattggcaaatgcaatcacggatggtgcagatgtgagagggtactttgtttggtcccttctcgacaactttgagtggctagatggatataagttaagatttggacttcactatgtcactatactaatcttcagagaaccccaaaattatcagctaccatgtataaacagctcatgtataacttcaagaatacaactcgaaaaaaatactgcccagaactagtaggagaagatgatggaaagtaaatattattttcactccagtgggactgtaatcatttaaacaagatttgttaattacacagaaacattctgttggaagaagttgtctcatatttgaatgctagtctatatataacttcttggcgtagagacccaagctaactagaatctacaataactccataatatagggaagtccaaagtctagcataataaaaaatgaagttcaagagagagcaaccatttgaattatatttttgtactaatatactttgttaaatggtgcagttatagaattgacagttagtagaatggccctataaaaatcttattatcctgatagattaatggctcagatgtaacaaagtagaaaattataaactgctatacatgcatggacaattgacctagcgattatttttctgtccatttgatatttagtcactaaggctatataattcaaagcaaaagtcttccaacccaaagctggataaacaatacacatcaacaaaaattatataatgaattgaaatggagcaagtattatttttgagattttgatcattttgttctttaatgctaaaagaatactctaatcttgaatttttatattaaagtagtaataatcaaaatggaataaaaagagtaacagtaggaagaagagataatatttattattcctatatatgaaatcatattcaaaatggacgggagaaattaagcatttttttctattaataaagcaaaagcgtgctattattctttatgcctatttctctgagctcgaaagagaaatactagctaccacttctctctatttttcctgcaatgttttgtcgaatagcaatttgcatgttgttttgtagtttcataatatcatcatgtcatcaaacagatatccttaatattatcaaaaaaaatcctaaccaagtttcaacttcaagcaatttcttatttggaacatcctcttcctgttaccaggtatgtacttcctacacttgttttctttttggggg ...

PMWtresholds.txt group1 motif_10 -66.7345 -0.106445 26.7912

occurences.txt Solyc02g080300.BGV006775.3 blamm motif_10 1 15 0 + . . Solyc02g080300.BGV006775.3 blamm motif_10 251 265 0 + . . Solyc02g080300.BGV006775.3 blamm motif_10 501 515 0 + . . Solyc02g080300.BGV006775.3 blamm motif_10 751 765 0 + . . ...

I receive this type of output for all motifs, independent of the seq.fasta.

jfostier commented 1 year ago

Dear,

Thank you for reporting. I cannot immediately reproduce the reported behavior. Can you please contact me by email (jan.fostier@ugent.be) and email me your source files.

Kind regards,

Jan.

SMZmk commented 1 year ago

Dear Jan, thank you for your reply. I have send you the material. However, I found the reason for this error. The input sequence in fasta format needs to be in capital letters! There was some untidy file processing on my side. cheers, Simon