Open KatharinaHoff opened 2 hours ago
Additional issue (I ran into this when removing the proteins that start with an X with awk in the first attempt):
If a protein sequence is not having line breaks after every 80 characters (maybe it is not 80, but I assume that's probably the threshold), the error message is also that the file is not in protein FASTA format.
And the same problem occurs with short proteins. If a protein sequence does not stretch across at least 2 lines (i.e. must be longer than 80), the pipeline also dies.
Hi!
Great work with FANTASIA! I have been testing it in various scenarios. When applying to metagenomic eukaryotic data, I rather often have incomplete genes predicted (but tool such as AUGUSTUS may also do that in single species genomes). If they are incomplete on the 5'-end, then they may start with an X because the 1 or 2 nucleotide at the beginning of sequence may not translate. In this case, FANTASIA dies with the error message:
The input is a protein FASTA file. I suggest to change the error message because it took me a while to dig out the examples that led to the failure. Others may also get stuck on that.
Here is an example that leads to failure (input sequence):
If I remove the very first X, then the pipeline runs.
Best wishes,
Katharina