gatech-genemark / ProtHint

Protein hint generation pipeline for gene finding in eukaryotic genomes
Other
55 stars 13 forks source link

Incomplete output #25

Closed dianamosa closed 3 years ago

dianamosa commented 3 years ago

Hello,

I am trying to run ProtHint in a mammal genome with ~2000 contigs. But the evidence.gff file only has one line and the prothint.gff file is too short. It seems that prothint run was incomplete but I didn't find any error messages on the log file.

My command line is:

prothint.py aPal.masked.fa vertebrates_proteins.fa --workdir aPal.masked_prothint --threads 40

And the output for evidence.gff file is only this line:

contig_1575     ProtHint        start_codon     22383853        22383855        5       -       0       al_score=0.476033; topProt=TRUE; CDS_overlap=0;

I do find this message a lot during the run

(in cleanup) (in cleanup) at /home/dmorenos/perl5/lib/perl5/Object/InsideOut.pm line 1953 during global destruction.

But, the same appeared when I ran the test files and both evidence.gff and prothint.gff were generated successfully, so I am not sure if this message is an actual error or not.

I appreciate your help in this matter.

Diana

tomasbruna commented 3 years ago

Hello Diana,

my best guess is that GeneMark-ES (initial step of ProtHint) fails to predict good genes and the rest of ProtHint is not able to proceed well. GeneMark-ES has trouble with GC-inhomogeneous genomes (such as mammals) and we are currently working on an improvement. I am also working on a version of ProtHint which does not rely on GeneMark-ES for initialization.

Unfortunately, neither of these fixes are ready right now, I am sorry for that. One of them could be ready within a month or two.

The best workaround I can think of is to use a different gene predictor (such as AUGUSTUS with parameters for the closest available mammal species) for initialization. You can pass the gene structures in a .gtf format from another tool to initialize ProtHint with the --geneSeeds parameter.

Let me know if you have any other questions, I'll try to help more.

(in cleanup) (in cleanup) at /home/dmorenos/perl5/lib/perl5/Object/InsideOut.pm line 1953 during global destruction.

This is an unrelated issue with one of Perl's modules. It occurs on some systems but does not have any effect on the result.

Best, Tomas