Closed Unalibun closed 2 months ago
Hi @unalibun
It's a bit late here so I haven't checked this out yet, but I suspect this might be an easy fix.
It looks like your input protein sequences have .
characters in them. The first one I could find was in jg6808.t1
.
The error you're getting from ApoplastP appears to be that it's looking up a value for an amino acid and it just fails when it gets the .
(because it isn't in the dictionary). Normally I filter out gaps and non-standard AAs at the start but i hadn't anticipated a dot before.
As a potential quick fix to get you going, i'd suggest trying to remove any .
characters or replace them with X
.
It's not clear to me whether they should be treated as alignment gaps or ambiguous AAs.
Something like this ought to do the trick:
sed '/^[^>]/ s/\.//g' p_ulei_braker_unmasked_corrected.fasta
I'll test it out tomorrow and add a step to remove these characters. Let me know how you go (if you have a chance to try it out).
All the best, Darcy
Dear Darcy,
The pipeline worked only by applying the sed script to my protein file. Thanks for your help. The program is very complete congrats!
Best Unalibun
Hi Predictor Team!
I have been running predictor for several fungal proteomes that have worked very well however a couple of them are claiming in the apoplast module. I checked the proteins fasta format because initially one of them content dots, so I remove them and try to run again, but the csv files are not created. Sorry I am kind of beginer I hope you can help me.
I attach the error and the fasta file
aslurm-4435.txt p_ulei.fasta.zip