ccdmb / predector

Effector prediction pipeline based on protein properties.
Apache License 2.0
11 stars 7 forks source link

Appoplast conflict #91

Closed Unalibun closed 2 months ago

Unalibun commented 2 years ago

Hi Predictor Team!

I have been running predictor for several fungal proteomes that have worked very well however a couple of them are claiming in the apoplast module. I checked the proteins fasta format because initially one of them content dots, so I remove them and try to run again, but the csv files are not created. Sorry I am kind of beginer I hope you can help me.

I attach the error and the fasta file

This is the error:
Process apoplastp (2) terminated with an error exit status (1) Command error: Process apoplastp (2) terminated with an error exit status (1) Command error: Process apoplastp (2) terminated with an error exit status (1) Command error: Process apoplastp (2) terminated with an error exit status (1) Command error: Process apoplastp (2) terminated with an error exit status (1) Command error:

aslurm-4435.txt p_ulei.fasta.zip

darcyabjones commented 2 years ago

Hi @unalibun

It's a bit late here so I haven't checked this out yet, but I suspect this might be an easy fix.

It looks like your input protein sequences have . characters in them. The first one I could find was in jg6808.t1.

The error you're getting from ApoplastP appears to be that it's looking up a value for an amino acid and it just fails when it gets the . (because it isn't in the dictionary). Normally I filter out gaps and non-standard AAs at the start but i hadn't anticipated a dot before.

As a potential quick fix to get you going, i'd suggest trying to remove any . characters or replace them with X. It's not clear to me whether they should be treated as alignment gaps or ambiguous AAs. Something like this ought to do the trick:

sed '/^[^>]/ s/\.//g' p_ulei_braker_unmasked_corrected.fasta

I'll test it out tomorrow and add a step to remove these characters. Let me know how you go (if you have a chance to try it out).

All the best, Darcy

Unalibun commented 2 years ago

Dear Darcy,

The pipeline worked only by applying the sed script to my protein file. Thanks for your help. The program is very complete congrats!

Best Unalibun