gatech-genemark / ProtHint

Protein hint generation pipeline for gene finding in eukaryotic genomes
Other
56 stars 13 forks source link

Error in braker (Prothint step) #38

Open minhasbushra opened 2 years ago

minhasbushra commented 2 years ago

Hi,

I am running braker and getting an error with prothint step as below.

error: Gene-protein pair "4145_g-ENSDARP000001129171-pep-chromosome:GRCz11:8:4838114:4850816:1-gene:ENSDARG00000094516.3-transcript:ENSDART00000146667.3-gene_biotype:protein_coding-transcript_biotype:protein_coding-gene_symbol:es1-description:es1-protein-[Source:NCBI-gene" present in the Spaln output was not found in the file with DIAMOND gene-protein pairs. This issue can be caused by the presence of special characters in the fasta headers of input files. Please remove any special characters and re-run ProtHint. See https://github.com/gatech-genemark/ProtHint#input for more details about the input format

error: ProtHint exited due to an error in command: //ProtHint-2.6.0/bin/flag_top_proteins.py Spaln/spaln.gff //braker/braker_etp/diamond/diamond.out > tmp

I was able to run braker successfully using older version of prothint with same file. So i am not sure why headers are causing a problem now.

Thanks

tomasbruna commented 2 years ago

Hi,

what were the ProtHint and BRAKER versions that worked?

There are some new features related to this error in the current ProtHint version, so that might be the cause. Another possibility is that the error was also present in the older versions, but it was never caught (and might have been causing undetected issues).

In any case, cleaning up the protein headers (for example keeping just the species and transcript_id) should fix this.

Tomas