VIOLINet / Vaxign-ML-docker

GNU General Public License v3.0
7 stars 4 forks source link

Output not generated when filtered sequences #5

Open DaniDelHoyo opened 4 months ago

DaniDelHoyo commented 4 months ago

I have been experiencing some errors with the output of VaxignML when there are filtered sequences and I have noticed 2 bugs:

Seems like the not recognized residues are mistaken by a new sequence name at some point?

abrozzi commented 3 days ago

SPAAN is causing the issue. In lib/spaan/SPAAN/filter.c, you can see a comment indicating that proteins shorter than 50 amino acids are filtered out. Additionally, non-canonical amino acids are incompatible with PSORT.

My suggestion: write a script to sanitize your input FASTA file by removing sequences shorter than 50 amino acids and any sequences containing non-canonical amino acids.