gatech-genemark / GeneMark-EP-plus

GeneMark-EP and -EP+: automatic eukaryotic gene prediction supported by spliced aligned proteins
11 stars 2 forks source link

Error in make_nt_freq_mat.pl, caused by empty stop_*.seq #6

Open hirnc opened 2 years ago

hirnc commented 2 years ago

Hello, I am trying to run prothint.py and gmes_petap.pl in a fungus.

The commands I ran were:

prothint.py genome.fa protein.faa --workdir prothint
gmes_petap.pl --EP prothint/prothint.gff --evidence prothint/evidence.gff --seq genome.fa --soft_mask 1000 --verbose

prothint.py finished successfully. Then gmes_petap.pl terminated with the following message:

error, no valid sequences were found
error on call: /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section stop_TAA   --format TERM_TAA

The last part of gmes.log is:

/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/parse_ET.pl --section EP_C --cfg  /workdir/run.cfg  --v
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section start_ATG  --format INI
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] /path/gmes_linux_64/make_nt_freq_mat.pl --cfg /workdir/run.cfg --section stop_TAA   --format TERM_TAA
/path/gmes_linux_64/gmes_petap.pl : [Fri Mar 18 14:46:11 2022] error

It seems the error happens in Training_E_anchored_C() in make_nt_freq_mat.pl, when running CountFromFile() with run/EP_C_1/stop_taa.seq as the input.

run/EP_C_1/stop_taa.seq exists but is empty. stop_tag.seq and stop_tga.seq are also empty.

What does empty stop_*.seq means, and how can I avoid this problem? Any suggestions are greatly appreciated!

tomasbruna commented 2 years ago

Hi @hirnc,

Did you run GeneMark in the --ES mode (without proteins) and did that work fine? The error you are observing is usually caused by poor coverage of the supporting proteins (this can happen when there are too few input proteins or when they are too remote).

I noticed that you are not using GeneMark's --fungus flag. Please try a run with this flag, it could also resolve the problem.

Sorry for the late reply, Tomas