Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
363 stars 81 forks source link

prothint seems to hang on some records #711

Open splaisan opened 11 months ago

splaisan commented 11 months ago

Dear,

I succeeded a very similar first run with docker (ONT assembly all the rest the same). The ONT run ended after few hours and gave results.

The second run hangs on some prothint record (PacBio assembly) I stopped the first attempt after 2days hanging and restarted fresh in a new folder and it hangs again at the same point.

using teambraker/braker3:latest; v3.0.6

image="teambraker/braker3:latest"

# get database for this sample type from https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/
# wget 'https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/Viridiplantae.fa.gz' && gunzip Viridiplantae.fa.gz
orthodb="Viridiplantae.fa"

# using orthodb proteins only
docker run \
  --rm \
  -it \
  -u "$(id -u):$(id -g)" \
  -v $PWD:/data \
  -v $AUGUSTUS_CONFIG_PATH:$AUGUSTUS_CONFIG_PATH \
  -e AUGUSTUS_CONFIG_PATH=$AUGUSTUS_CONFIG_PATH \
  ${image} \
  braker.pl \
 --species=${species} \
 --useexisting \
 --genome=/data/${outfolder}/${asm} \
 --prot_seq=/data/${orthodb} \
 --workingdir=/data/${outfolder}   \
 --threads=${nthr}  

some of my terminal output (full log attached)

braker_firstrun.log braker.log

#**********************************************************************************
#                               BRAKER CONFIGURATION                               
#**********************************************************************************
# BRAKER CALL: /opt/BRAKER/scripts/braker.pl --species=Chlamydomonas reinhardtii --genome=/data/pacbio_results/pacbio_draft_assembly_softmask.fasta --prot_seq=/data/Viridiplantae.fa --workingdir=/data/pacbio_results --threads=48
# Tue Nov 28 08:56:41 2023: braker.pl version 3.0.6
# Tue Nov 28 08:56:41 2023: Only Protein input detected, BRAKER will be executed in EP mode (BRAKER2).
# Tue Nov 28 08:56:41 2023: Configuring of BRAKER for using external tools...
# Tue Nov 28 08:56:41 2023: Tryin
[braker.log](https://github.com/Gaius-Augustus/BRAKER/files/13489415/braker.log)
g to set $AUGUSTUS_CONFIG_PATH...
# Tue Nov 28 08:56:41 2023: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Tue Nov 28 08:56:41 2023: Checking /opt/biotools/Augustus/config as potential path for $AUGUSTUS_CONFIG_PATH.

the current command is:

# Tue Nov 28 09:21:13 2023: starting prothint.py
/opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=48 --geneMarkGtf /data/pacbio_results/GeneMark-ES/genemark.gtf /data/pacbio_results/genome.fa /data/pacbio_results/proteins.fa

...
[Tue Nov 28 09:43:09 2023] Enqueueing pair 246182/248920 (98.9%). Est. time left: 00:00:13 (hh:mm:ss)
[Tue Nov 28 09:43:11 2023] Enqueueing pair 246431/248920 (99.0%). Est. time left: 00:00:12 (hh:mm:ss)
[Tue Nov 28 09:43:12 2023] Enqueueing pair 246680/248920 (99.1%). Est. time left: 00:00:11 (hh:mm:ss)
[Tue Nov 28 09:43:16 2023] Enqueueing pair 246929/248920 (99.2%). Est. time left: 00:00:10 (hh:mm:ss)
[Tue Nov 28 09:43:16 2023] Enqueueing pair 247178/248920 (99.3%). Est. time left: 00:00:09 (hh:mm:ss)
[Tue Nov 28 09:43:18 2023] Enqueueing pair 247427/248920 (99.4%). Est. time left: 00:00:08 (hh:mm:ss)
[Tue Nov 28 09:43:19 2023] Enqueueing pair 247676/248920 (99.5%). Est. time left: 00:00:06 (hh:mm:ss)
[Tue Nov 28 09:43:20 2023] Enqueueing pair 247925/248920 (99.6%). Est. time left: 00:00:05 (hh:mm:ss)
[Tue Nov 28 09:43:21 2023] Enqueueing pair 248174/248920 (99.7%). Est. time left: 00:00:04 (hh:mm:ss)
[Tue Nov 28 09:43:22 2023] Enqueueing pair 248423/248920 (99.8%). Est. time left: 00:00:03 (hh:mm:ss)
[Tue Nov 28 09:43:23 2023] Enqueueing pair 248672/248920 (99.9%). Est. time left: 00:00:02 (hh:mm:ss) # hangs here

any idea what this could be and how to circumvent it, here are the running jobs (with 100% cpu on one thread)

u0002316   55913  0.0  0.0  17504 12368 pts/0    S+   10:21   0:00 python3 /opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=48 --geneMarkGtf /data/pacbio_results/GeneMark-ES/genemark.gtf /data/pacbio_results/genome.fa /data/pacbio_results/proteins.fa
u0002316   64582  0.5  0.0 3777148 481632 pts/0  Sl+  10:23   1:38 perl /opt/ETP/bin/gmes/ProtHint/bin/run_spliced_alignment.pl --cores 48 --nuc ../nuc.fasta --list /data/pacbio_results/diamond/diamond.out --prot /data/pacbio_results/prot0cu2zsy0 --v --aligner spaln --min_exon_score 25 --longGene 30000 --longProtein 15000
u0002316 1420444  0.0  0.0   7492  3976 pts/0    S+   10:42   0:00 bash /opt/ETP/bin/gmes/ProtHint/bin/spalnBatch.sh batch_2233 batch_2233_out 25 0 30000 15000
u0002316 1434514 99.9  0.1 1242452 763540 pts/0  R+   10:42 295:23 /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/spaln -Q3 -LS -pw -S1 -O1 -l 23802 nuc_223256 prot_223256
u0002316 1434515  0.0  0.0   1992     4 pts/0    S+   10:42   0:00 /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/spaln_boundary_scorer -o nuc_223256_prot_223256 -w 10 -s /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/blosum62.csv -e 25 -x 25

Thanks in advance

tomasbruna commented 11 months ago

Hi @splaisan,

It looks like this ProtHint error occurred https://github.com/gatech-genemark/ProtHint/issues/14.

A quick fix is to remove the protein in prot_223256 from the input protein set.

I will look into patching ProtHint to fix this, but that may take a while.

Tomas

splaisan commented 11 months ago

Hi @tomasbruna ,

I found the suspect file in the local spaln subfolder in my output folder and can easily remove it but how do I restart the docker run without recreating it?

cat nuc_223256
>6434_g
CGGCAGGTCCCAAGGAATCGGCAGCCCTGGCAGCTGACATCCTAGCAGCGGGCGGCTCTTACGTGGAGGCCCCGGTGCTGGGCAGCCAGCCTGAGGCGGAGAAGGGCACCCTGCTGGTGATGGTGGGCGCGGAGGCCGACCCCCGGGAGCCCGGCAGCCCGCACCACGACACCGTGTGGCCGCTGCTGCGCGCGCTGGGCCAGGAGTCCAACATCCACTTCATCGGGCCGGTGGGCACGGGCGCGGCGGTCAAGCTGGCGCTCAACCAGCTCATTGCATCGCTCACGGTGAGAGGAAGGGTAGGAGGGGGGAAGATAAGGAGGAGCCGAACAGTTGGGCGCTTGGAGGTTTGGGGATTATGGCCAGGCTGAAACGGGGTTGCTGTTTGTGTCGATGCCCCGGTTGCCCGACTCCTGCCTCTGCGCCCCCTGCCGCCTTGTCCCACCTACCCTTGCAGGTGGGCTTCTCCACCAGCCTGGGCCTGGTGCAGCGCAGTGGCGCTGACGTGGACAAGTTCATGAGCATCCTGCGCGCCTCCGCACTGTACGCACCCACCTACGACAAAAAGCTGCAAAAGATGCTGGACCGGGACTACGGCGCCGCAAACTTCCCCACAAAGGTGTGTGGACAACGAGACGCGCAGAGGCATGCAACCTGTCGAGCTCTTATGTCCACGCATTGTAAACTTAGCCAGCGACATCAACTGCGGATAAGCTCAACCGTGCCCGCACCTTGGTCCCCTCGTCTCCCCACCCGCAGCACCTGCTGAAGGACGTGCGTCTGTTTGAGATGGAGGCTGCGGCTGCGGGTCTGGACACGCGCCTGCTGGCGGCGCTGAAGGGCGTGGTGCAGGACACTGTGGACCGCGGCCTGGCCAACACCGACTACTCGGCGGTGTTCGACGCAGTGGCGCACCCGGGAGAGCAGCAGGCGACCAAGCCGCAGCAGTAAAGGAGAAGGCTAGGTGTGCGGTATCTGAGCTGGCGCCTGGGCGCATTGCGGCATGCGCCCAGGCTACCGGAGCACAGCAGAAGTCGCGCGGGAGGTGCACGAGCAGAGCACAGGGCGCATGACACGAGTAGATTGAGCAACGCAGCGTCGAATGATATATGTGCGCGTGGGCGGCGGCGACGGTGGCGGTAGGGTTGGCTCCGTGCCTGCTGCTATGTGCTTACTTTGCCCGCATCAGGCACAAGGGTAGATACGATACACCACCTCGTTGCGATGCAATATGCGCATGGGGCTGCCGCCCTGCAATACGGGTGGGAGATCAAGGGTCATCACTCATCAGAGTGCGTAAGCACGCTGAGAAGGCCAGAAGGGGCATGCACTCCAGAAACGCTTTGCTGGGGCGCGTCAGGCAGCATGTCACATGCCTCGCCGGAACGTGGGCTGCTGGGGCTCTGAGTCGAGGCGAGTTTTGCGACAGCAGTAACCTAGGCTAACCTCTATCAAAGGTGCCCGCAGTGCAGTGGCGGCGTAGTAACGTGGGTAGCGTGCGTGGGTAGTAAGCACACGGTTCTTCACTCCTGGGCGGTTTTGTAGCATTGAACTGTCCAGTGCAGGCTTTGAGTCCGCAGCAAATTTCCAAATGATAAGCTGCGTTTTCCGCGACGAGCTTCTGCAACTTGTGAGTGGTCCTTTGAGTGGTACTTCCTCGATCAAGGCATAACCTGGATAGCAAAACGCTACCAAGGTGCTTTCAAATATAAAGAACCACCTCATACGAAAACGCCGGCAGTCCCAAGAATTCATGGCCGCGAAACCACAATGCGTGCATGACACATACATCCCCCGCTCTTCGTTCATTTTTTCGACCTAAGAACACTGGAGTTCCAGTCAATAACGATTCAGAGTTCAACGCATGCACAAGATTTCGCCATGCACAAAGCCAGCTAACGCTGTTCTGCGCTTCTAATTACAATGTGTCGTCACTGACTGCTATCGAGGGACTCGCTGTTGCGTTTTTTACAAGGAATAATCTGCTTGAGTCGGACGCAATGAAGGAGGTGCGTTGGGGAGCAAAGGTGGGGGCGTTTTCGACAGAAGGGTCCAGGGCCAAGGCGTCGACCTCCCCGTCAGGTTTCATTCTAGATTTGCCAAATATTGCCAAATATGCCAAATATTGCCAAATATTATGATAATGATATTTGCATCGCACCTGTGCTAAGCGCGTATTGACGAGCGTGGGCGAAGCGTTTGTCAGCGGCACCTATGCACAGACCCGGCGTGCATACATTTGCAATAGGACTGCTTATCATCTAGATAAACATTTCCACCCACGGGTGTCACCGAGGATGGCCCTCGTCGCTTTTGCTTGTCGCCAGCCCTGCCGGGCTGCTTGCGCTGCTGTGGCCTTTCCCTGGCCACACTCTATATTGTGTTCCAAGCATATCTTGCATTGCGAACACCAGTTGAGAAACTTGCCGAGCCCGCTGTCAACACCCGCTCAACTGCCCCAAACTTTACGCGACCGCGACAAGCACTGAAGATTAAAGACCACGCTGTGGAGAAGTCCTCCGGGGCTTCAAAATATGGCGCCGTGACATGCTGGTTTCTGTTCGGGCGTCCGCCCAATTCAGCCCTTGAGGTACACTCACGGCTCTGCCTGCGAATACCCATCCACACAGAGGTACGCGGTCATCGCTCCGCTGGGTTCCGGCGCCTATGGCTGCGTGTATAAGGTATGTGGAACAGGCTGGCAGCAGTGTGAAGCGGGGACCTTGGGAACAGCGGTGCCCCGTGGGAATTGGCGGTGTGACCTTGGGTGCGACGGGACAGGGTAAGAGGCAGGTTGGCGTAGCCCCGTGGCTGATATAGCTGGGTCCCGAGAAACAAGTTACGCCCAACCCAGGCGCCAATGAGACATGGAAATACGTCGCCTCCGTGAGAATCGTGGGGTGAGAGACACACTCATGAACACGCCTCCCCTCCTCTATCCCTGTAGTGCCTGGATCGCGACACGGGCAGCCTGTGTGCGCTCAAGGTCATCAACCTCGCACATCAGGAGCCCGCGGTGAGTTCCAAGCACCACAGCTGCACCAGTCAGTTCTGAGTGCGGGGCCACGCGGCTGGCCCAGCTGCCCAGCATCGCAGAGGCGTGTATGGCATCAGTATCCTTGGTCACCGGCATTCCTGCGATACAGCGTTAAACTCCCCATCACGTGTTTACGCTGGTGTCGAAACTGCTGACGTGCCTGTGTGGGTGGGCGCCGGTGCACAGGTCATGCGGCTTACCATGCGCGAGGTACGCACGCTGCAAAAGCTGCCAAAGCACCCGCACATTGTGGAGTTGAAGGATGCGTTCAAGAGCTCGGGCAGCGGCCGCGTGTTCCTGGTCTTCAGCTGCGAGGGGCGCAGCATGCATGAGGTGCGCGATCGCGGGGCAACGCGTGAAGGGCGGACGGGAACCTTCAGCTGTTGACTTCCCAAGGCCCCTCCAGGCTGCCCCTGTCAGCTTGACTTACTGACTGAGCTGTATGGTATGCCGCACACTCGCGCTCCGGCAGGAGGCGGAGAACTACGCCAAGTATATCCTGCCGGGGCCCATGCTGCGCCAGGTGGCGTGGCAGTTGCTGCAGGCGCTGGCGCACATACACGAACACCAGGTGCGTGTGTCCAACCGAGTATGTGCAAGGCGCGTTCGTGTGACTGGCGGTCTGTCGGGGCGCGTTGTCTCCAAGCCCGGGCGTATTTCAGAGTCCTGCTGACCGCGCGCCCACCACCGCCCACCACAACCCTCGCGCGTGCATCCGCACGCACCCAGATTATCCACCGTGACGTCAAGCCCGGCAACATCTTGCTGGTGGGCGACGGCACCGGCGGCGCGGCGGGCGTGGGCCTCAACGGCGCCGACGTGCACATCCGGCTGGCGGACTTTGGCTTTGCCCGCAGCTGGCAGCCGCACGAGGCGTTGTCCTCCTACGTGGCCACGCGGTGGTTCCGTGCGCCAGAGGTGGGTGCCGATTTCGGTTTTGATGGTTCTTGTCAAGGTGTGGCTTGCTGGGGCGCATGAGGTGGTTCGGTGGCAGCTGATCCAGCGCGGTGGGCCGTGTCTCGGGTGGGTGAGCGTTGCACCTGCGGACACCGCACGCTAACCTCCGCACGCGGGGCCTGCGGTCGCAGATCCTGGTGCGTGGCAAGTACAGCTTCAACAGCGACTGCTGGAGCGTGGGCTGCACCATTGCCGAGTGAGTCGCGCGTTGGGGCTTGGGGCTTGGGGCTCGGGCCATGCACTGCCTTGTCGGCTGAGAGGGTACGGTATCCAATCGCGTAGCTGCGCGAGGGCGTGGCGGCACGGTCTCGAACCCACGCACGGCCACCGCACCACTGACGCCCGGACGCTCCCTTCCACTGCCTTCGTTACGAGCAGGCTGGCGGTGGGTTCGGCCCTGTTCCCTGGCACGTCCACCATCGACCAGCTGGCCCGGATCATGCGCGCCACGGGACCGCTGCCGCCCTCGTTAGCGGCGCAGATGATGTCGGACCGAACTCTGAGCCCGCTGGCGGCGCAGCAGCGGCGGCCGCCGAACCGCACCCTGCGCGAGCGCCTGCCGGTCGAGGCCCGACTGTTTGAGTTCCTGGCCGCCTGTCTTCAGGTGGACCCGGCCCGCCGGCCCAGCGCCAAGGAGCTGATGCAGATGCCGTACTTTTGGGACATCGTGCCGCGCAGCCGTGCCCTGCCCAAGGCCTCCATGGAGGCAATGGCGGCCGCACGTGACGCCGCCGCCGTGCAGATAGCGGCGGCTGAGGCTACCATCGCAAAGCCGGCGGCGCAGCCGGCGGCCGTGGCTGTGGCCGCGCCCGCGGCGGCGGCTCGCAAGGACGTCGTGCAGGTGGAGGCCAAGGGTGCGGCGGCCGCGCCGGCGGCATGCGGCGCGGTAGCGGGCGCAGCTGCCAAGTCCAGCGGCACGGACAAGGCGGCGGCCGGCGGTGCTGGGGGCCAGACGGCCTCGAGCAGCGTGGCGGCACCCATGACTACCACCCGTACTGCAAGTGAGGCCCAGGCCATGAGCCTCTCGGCCGTCGCTTGCTGCCCGGGGACTGACCGCGCGTCGACAGCGGTGCCCCCTACGGCGCCGGCGCAGCTGGCCGCTGCACCTGCTCAGGGCACAGCAGCAGGGCTCAAGCCTGCAACCAGCGTGGTGATCTCGGTGAAGGCAACTGCTGCGTGCGGCCGGGACCAGCCAAGCGCGCCGATGACTGGCTCGAGCCTCAGCACCCGCGACCTCGCGAGCATGAATCCCGCCGCTATGCCAGCGCCTGCCAACTCACAGGGCAGTGGTGTGACATCGGTGCCAGCATCTCAGGCAGCGGAGCAGGCGGCCGCCGCTCCCTCAGCGGAGCCGCCCCCACGCGTTGTGTGTATGCCCGACCTCACCAGCGTGAGCACCTTGGCATCAGGTGCGGCGGGGCCGCAGCCCGCGCAGCCGGCGCGGGCGCGCGCGCCGGCACCGGTGGCGGACGCGTCGCCGGAGGACGCGTCGCCCCGGCAATCCAGGACTGAGCGCGAGCTGCAGCGGCCACAGGCGGCCGTCACGCTTGTCACTAGCTCTAGCCTGTTCCCATCACCGCTGCCAGCACCGCTGCCTCCGCCGCAACCAGTTGCAGTGGAGGCGTCGTCGCCGTTCACGCTCGTGGTTGCTGACACTCTGGGTGGCGCTGCGGCGGGTGCCGCAGGCGCCGCCGCCCCAGGCGTCGCAGGCGCAGTCGGCGGTGACAGCACGCCGCGCAGCCACACCACAGCGCGCATGCTGGACCTGCCCTCCAATACCGTGGAAATGTTCATATCGCCCACCACGTCGGTGGCAATGCATCGGCTGCTGCCAGCTGTGATGACGCCAGTAGGCGCACCGCCGCCGGCCACGCCCAGTGCCGCCGTGCGCTTGCGGCAGCTGATGCCGCACTGCCGTGCGCCGGCGGGCGCGGTGCCGCCGGTCCTGACCTACGGGATGCTGTCACGCAGCAGCACTCTGGAGCTGGACATGACGGGCAGTGCGGCTGCGGCTGCAGCGGTGGCGGCCGCTGGCGTTTGGGGAGGAGATGGAGGAGCGAGCGGGGATGGTTATGGCGTGTCGTTGGCGAATGGGGCCTCAGCGGGGCAGCTGCAGGCCCACATACAGATGCAGCAGCAGGCGGCGCAGCGGCATGCCCCGGCGGCGGCCGCGAACAGGGCGTGGCGGCGGGCGGGTCGCGCGTCGGTGGAGTTTGCAGACCAGCTGTCATGGCCGGCAAATACCAACCAGCCCGACCAAACGGTCAGCGGCGCCAGCACAAGCAGCAACATTTGGGCCAGGGCTGTCACTCCTGGAGCCGGCGCCGCGCGCGTTGGCGGCAGCGGCGGCGCAGCCGCCACTGGCACCCGAAATGTCACTTCCGCCGCCATTATGCGTCGCAGCTGGCGGCTGCTGCCGTACCGCACAACGGGCGGCAGCCCCGGCTTCATGCCCGTGCCCACGCTGGGCGACGAGCCAGCTGCGGATACGCCGTCCCTGCACACGTCAGGCGCGGGCGCCGTCGCGTCGTTGGTCAATGCTGCCGCGGGCCTGGGCCGCCACAACAGCCGCTCGCAGGCGTCCTTTGTCCGCAGCATGTCGCGGATGTCGCAGTGCCACGCGATGCCCTCGGGCGCCCTGGACGTGTCATCTGCGGGCCATGACAGCTCAGTGGACGGCGCCGGCGGCTTTTGCTCCGCGTACGCAATGGCGAACGCATCGGCTGGAGCGACATCCTCGCCACTTGTGGGCCTGGTGACCACGCCGCAGCAGCCGGCTAAGGCGCAGCAGCTGCAGGCACAGCTGCAGAGAAACGGGTCCACAGTCGGCGGCGCCGTCGCACAGTCGCCGCCCATGCTTTACGGCCTGGTGCTGGCCGCAAGCAGCGATTCGCCGTCCCGCACGCGCCGCGCCGCGAGCGCCGTGCTGCCCAGCTTTCCTGCAGCCAGCGTACCGGGAGCCCACGCCACCCCTGTCACGTACACTGGTGCCAGCGCCGCTGATGCCAGCAGCAAAGGCCCCGCCAGCGTGGCTGCAGCAGCAATGGCCCTTCTCCTGCGGTCGTCGTCCCAGCAGCAGCGTGCTGCCACCGCCGCAGGGCATGTGCCGCACGGCACAAGCCGCCTCGCAAACGCCGTCAGCTCCAACCTGTGCGATTACCCCTCTGGGGACGCGGACATCGCGCCCACCGCCGGCACCCCTCAGGCGGGAGCCTCCGCCTCGGCCTTTCCCAGCGGCACGCCGATGGGCACAGCGACCGACTCGGGCGCCGTGCGTCGTGCACTCGGCTTGTCCTGGCAGGTGCTCCAAGCCGTGGGCTGCAGCAGCAACGCCGCGGCAGCTGCGTCCACGGCCTGCTTCGACAGCGCCGCCTCCGCCACCGTCGCAATGGCACAGGCCGGCGCCGTGTCGCTTGACGCAATGCTGGCTACTGGAGGCGGCGATGGCGGCGGCGCCCCTGCAGATTGCGGCCTTACCGCTTCGGCGTCGGCAGTGGCACGCTTCCCCAGCGCTAGCCTGCTCACGGCGGGCGGCGGGGCCGCCAATGGTGCCTACGTGCCCCACGCGATTACCGAGGAAGAGAACGAGCTAGCATACGCGGCAGCAGCGGATGCGTCAGCTGCCGGTGAAGCTATGGGCGCGGGGTGCAGAGCCAAACATGTGCTGGACAACTCGGATGGATGCGTGCGTCTGGCTGGCTCAAAGGACACGGCAGCGGGCATGGCGCACCTGCAGCAGTCTGCCACCACGCAGCATCCCTTGCCTGCGCGCACGGCATCCCCGGGTGGACGCCGCCAGGGCGCACATGACAGCAAGCAGCGGCCAGGGCTGCTTGCCCGTCTCTTCGGCTGTGGCCGCTTTCGCAATGACCAAATTTGAAGCAACATGTGAGACAGGCCGCCGCTTGTCGGATGGTACGTGTTGGCAGATTTGACACGGCGTCGGGCCTCGGGCCCCGGTTGGGGATAGCAGTGTGTTTTTGGGTGTGGCGGGCCGGACCCACTTGACCGTACGGTAATGCTTAGGTACGGAGCTCAGGGTTCAGGCTGTGCACTTGTTTCTTCTTCTGATATGAATGCGACATTGCATATGCAATGAGGTACAATGATATACTGGGTATTTGCTTTGCCTTGGACGTGAATGCAGTAGCCGGACATGGAACATGGGTTTCGACATGACCGTGTGTGTTCGCGGTAGAGTTGTGCACACACACCAGGCTTGCCTAAGGGTGGGCATGGGGATACTTCAATTAACGAAGGTCACGTTTTAGGAGTGTTTTTGGGCGGAGCGGGAGATGAGGTAGACGCTTGCGGCCCCAGACGGGAGGCGTCAACTATCAAGTTGATCCCATTTATTCCATATGAACATGGCTGTAATGATGCGGCCCGTGGAAGTGTGAATGGGGGGCTGTTCCATGGATGGGTGAGTTTAAATGTTCCCGGTCGCAGTGGGCTCTCGTGCAACCAGGTCCGGATTTTGCGCGGTATGGCTAATTGGTCGTGCCGACGTGAACAGGGGCAGCAGTACGTACTGTCCGTTTGTTGCATTAGCATTCATGATTAGGGGAGACCGCAGCATTTTAGCCCTGGGGCTAAGGTTGTTGAGAAAGAGCACCAGAGCATATGGAGATGTCGCTGTACTTCGGACGAGTACGCCTGGAGGCTGAAAGGAACCTTGCTGCGGTTTGTACGACGCAGACAGATGCTCGCACGGTCTTGCAATGCAAGATGACGGTCGAGTCGTATACGTGCCATGATGATGTTGTTTAATGCTTCACCAGTTGACCGATTATCGCTGATGGGCGCTACAGACAGGGAATGTCCTAACATGGACAGCTGCGAGCAGCTCATTGCGCTGGAGTGTGAATGGAGCCAGAGAAGTCTGAGCAGCCTTGCAAATGGAGATGCGCAGTATGCTTGGTGAAGGAGCTAAGCCCTGCATCAAAGGCCGGAGATATTTGGGGTACATGACGCAGGTCACGAGCTTGCGTGCAACCACAAGTGTGGTCGTCCAGCTTTAGATCTGGGGGGCGTGCCAACAGTGACCCCCACGCACGTTGGCCGGAACGTGTGTGTGGGGGGGCTCGGGTTCTAGTTGGCAATGGGTGCAGGCGGTGCGGTCTGTGCGAGGCGGGAATCTTTTACAGTTTGCCCAGGGGCGGCAGCCGCTGCAGTGTTGGCTTGAGAAGCAATGTCTTAGGCATGAGATGGGAAGGGAACATGGGCAGGGAGCTTCGTGACGTGGGGCCGAGTGAAGGACGTACTCTGTGGAGTCTGCGCCTTGGGCTGGTATGCGCTGCTCCCATGAAAGCGCCACAATATGCCATGGGATTTTTGTCTGATGCCTACCAGTAATCATCTATCAAGTTGGGACCTGTACGTCATCTTCTTCCGTCGCTTGGTTGCCTCCATCTGCAGGTGAGCGGCCAAGCACAGCCAGTCACAGCTAGTTGCTAGCGTACACGTTCCAAACACTATCCTACCAGCTGTTGTCCATGCAGCCGTCCGCTGCTGTTGCCTGGCGGAGTGCTGGTGCACCCTGAGGTGTGACCTCCACCTGACCCCTCATCGTACAGCTACCAGTCTCAACCCGCGTGCCGGTGCCACTTTCCCGATGGCACCACAGCGCACCACGCTCTCCTCCTCGTGTCCCTGCCACGGCTGCCAGCAGGGATGCCAGCCCATGGCTGCCAGCCTGCGCAATTTGATGACTGACTGCTCCCTCCGCCCCGCAAATTCGGGTACCTTCACTGGAAAGTGCGTTGCAGTCCCCAGTCGCAGCAGTGCGAGGGTGCCCAAGGTTGCCCATATTGCAGTGCCATGCTTTGGGCTTACTTGAGCCATCATTTCCGGACCATGCTGCACCTGGCTGCATGCATC

cat prot_223256
>3055_0:000816
MENYEYLGDLGSGSYGFVWKCVQRSTGRVVAVKGFKLAHTDKKFLDAAIREVRMLRNATDHPNIIQLLEAFRSSTGRVYMVFEFADKCLSAELHKRFTCGLPAGQTRVVLWQVLAAVAHLHSKKIIHRDIKPGNILMTSDGVVKLCDFGFARLTRGDPYQPDRFSSYVVTRWYRSPEMLVSDLYGAPSDIWSLGCTFAELATGRPLFPGASSLDQLWRIMRCMGPLPPTQAERFAAAATAAGLPEAPPPPPRGKSLWQRLPELDSRLLDLVQACVRLDPAQRPTAVQLMQMPYFHEIPKAIAGSRLEQLYLAIGSGTGYPGSALGRTASARFRQMQQLAAQQKAAGAAAGGAGSGTQPNVASVPAGGSAGVRGLGGSVTVSVMSPEELLASPRGGHATSGSVKRPASVLLSSVAEAVLGEKPSAGDGSGDCSIFPLAPPLPHIPMVDIAMLLSAQQQQQVQPQHQLQQAPLQGSQRYAAASAAAVVPLAATAAAGPSSSRLHSVSSPFKTVPMLPPLQPAPTSGDVVMPAAAAPIIAAAAASAAMSQSPRSSASMSSPSPHPPGTRRQLSGTSPRGAAPAGTASGRNLLAAATAGAGAAAGRQASGRGLPMGGLVGGVAAPESTGGGSSPTAAGVAVAVPPSVRLAHLSSLSPRQRQHLPQLSPLQRQQQSQALPAAATSVAMPPSAFLDAEARGDSLGSGSGGDGEETDDEILAARQGCRRNRQGYERDGSASRLGRNAGGAVPAGAAAMATATGGAAAAAALPPASASIMPVEAHAMPGLGLLEGYDAQDTSDDDEAQVSDDDELMAFYVARKSGGRGRRGGAAGTRATGSRRKVASAAAASTGALTTPAPAAASAAAMSASGAMHGAAAPAAAATKAAAAATRDELIGVALQAAAAVDMATQEMHMAGSTGGGMQPMQMEADAGMSLHVAAATTAAPLRGAHHNGVAAVDAAAPSPALASWPTAAAPAAGIIAASGLGPRAVAAQPPQRPLPHAGIHQQHHGLYGTQGSHHRQTMPRTTGGGGSSRGSTGTGATPVAAGLNRRVTAMVLGTGLEDAVSHASAAANPNTAATGTPASAAVAAASAPAQPRPLAPAALASCSPTPAVTITSAAATPVVAPLPPPPRFPTGAVAKRATVASYLAISQPNGSMAVTSASVLASGTSATVATADAAVAASSGTTVSQPLPVPRSVARGGQGAGMSGGIIVGTDTGGTGPVAGAVRGAATATGLTHMGTGSLPTVGSIGPGLRHHNHATTMGLTLMAPHESGPRGLGGGAAVTPSAAGVHLQGHGPASLPYGRASLPVQGGSYVGFSTGSANRRMLSRQGSTVFNQLMYDALPEIGTPGGAPDVPAGTPPPQRRRAVMSGFTPCRTAAARAAAEGLPAAAAMAAAMGSNTTDLSVAFSPIAVARHEDPLSIGDGHGLERSSVGAAPGFRSVQFGLACGAGGAYPGASAAGGAHRRQASMQMQTAYTASVGIMGAAGSDLGPSAATAIPGGGAAGGRGSGSYHSHASDTGMLMGSSAPVSHAMHPGYGSGSGSMGGSYRWPGQRILVPDQAHGLATATVTAASGPAGGPPVRGGRLPQAVGLAASGSSQQTSGSAASGAGPLGSGTTVGAAAGAHAAAATPGRSRLGSGILGRMSDDPAGGSMLGAGVGAGAGGGGSHGQHPVLVCTADDVHCSSALNIELDGSCSVGNNTGGGNSAGMWGFGPMAGYPAGAGAASGAVIAARAGGGGRSRWLGSGVIDSLPEDREVLHVAGVDDWRLGNSPGIAGGAGSGVGMAELVLGASDHYSSGLPPAPTSGPTLAEVSAAVAGAILAPSSSSAMGFGYKLSPRGQPATIPGQAGLMGLRPKSPAGSLELLRGRTNGHAGQASYGHGPSGLHQAGGALGSPSSPRSPGSGDAPGRPGSAQLPLAGDGSGMRFAANGSPSRAWVTEGCAAGGGTIGAAAVADVGAAAGAAGKLASADKAEKSKWPRAKALLGGKLISSLVKKFKDGVQVSDRK

Thanks in advance Stephane

splaisan commented 11 months ago

never mind, I figured out that this protein was well created from the Viridiplantae.fa input after all.

Here is a bioawk command to zip it for other having the same issue

mybad="3055_0:000816"
bioawk -c fastx -v header="${mybad}" '{if ($name != header) print ">"$name"\n"$seq}' ../Viridiplantae.fa > ../Viridiplantae_edited.fa
Changwanseo commented 9 months ago

Same problem occurred with this sequence (It has been stuck for 14h 30min in 5975wx system). I hope this could help solving problem

Command

braker.pl \
--genome=../202_repeat_mask/${ID}_masked_nuc.fasta \
--species=${SPECIES} \
--prot_seq=/data/genome/db/orthodb/odb11v0_all_fasta_no_asterisks.tab \
--GENEMARK_PATH=${PATH_BIN}/GeneMark-ETP/bin \
--PROTHINT_PATH=${PATH_BIN}/ProtHint/bin \
--AUGUSTUS_CONFIG_PATH=${PATH_ENV}/braker3/config \
--AUGUSTUS_BIN_PATH=${PATH_ENV}/braker3/bin \
--AUGUSTUS_SCRIPTS_PATH=${PATH_ENV}/braker3/bin \
--fungus \
--threads=${THREAD} \
--softmasking \
--useexisting
>2090_g
CCTTTGGTCGATCCGGATTTGCAATTAAAAGTTCTGACTCGAATATTACGCAGCGAGTAGCAGCAGGCTTGCGTGGACACTTCTTGCGCAGCAGGAGAGTGAAAGAAATGATTACATCCCAGAACCGTCGAGTTCGGACTCTGACGCGTGCTTGTAATCAACGCATACACACATCCTTCTTAATAATACACAGCGTACACTACACATACTTACAAAGAATTAGTCGAGAGGTGCATTACCAGGGCCTCTGGATGTCCGCCTTCCGAACTGACTCCGTGCAAAGGCCGAGCTGGAGTTTCGTTTCATCTTGAGTGAATATGCTTTCGCTTTCGAAGGGCAAAAATAAAGCGTCACGTACAATCGAGAACCCGTGCCACGGATGTTTGATAACAACGCGCTCTCCAACGCGGAAGCAGACAAGCTTGTAGACTCTGCTGGTGTTCTTCAGAATACGACACGTCAGCTAGACTTGTAAGCACGTACCACTGATGTCACTGAAGCTCGATGCACTGGATCCATGCGTGCTGCTCCTGTCGCTAGTATGCGCCATCCCAACCGCTATCGGTCTATACTGCATGTGTGGTAGACCCCTCGCCAGACCCGAGGACAGAGCAGGCGGGCTAGATGACTTAGTTTGTGTGAAATGTCGTGCTACAGGGGACTGCGTCTGGGATTCAGGAGGTGGAGATGGGATAGACGGTATGGAGCCCTGAGGAGAGCTCGTGTTCGAGTTGAAGTCGCTAGAGCTAAGGTCTCTCCTGGCATTCCCTGTATACGTCCGGCGCAATGGTGACGGAGAAGTGGGTGACGTGATGTCGCGTCCGCGCTGAGGCTGCTTGCCTTTCGACTTTGGGCGGCCATCTCGCACAAGCCCAAGAGCGTCACTTGTCAGTACAGACTGAAGATGCTGCAGTTTCTTATCGAGTGCCTCTTGCTCCTCCAAGCGGCGTTCCTCCTCCTCCGCTTTCTCGGCTTCTTCATCTATTTCAGAGTCCGAGTCTGAATGCGCCGGTGACGATGTCGACGTCTTTATTGGCGGTGGTACATGTCGAGTACCTTGCAGAGTGAGGACGGAGGATGAAGACGTCTTCTGCAGCTTGGATCCTGAATCATAGCCTAACGAGTTCAGTCGCGCACGAATGCCAAGGGGCGTATTCAATCGGCCAGATGAAAGCAACCTGGCACTTCCCGCCGTCCGACCAGACGTCCTTCTGACCAACTCTGGACGATCCTGGCCCTTGACGTCTACCTCAACGGGGCGTGCTGAAGGAGGACCCGTGGCAGGTGCGAAAGGAACTTGAAGATGTTGTATGCCCTTCAAGTCCTCTTCATAGCGTGTTTGTGCTCTGAATAGAAGATAGGGAAGAGGAACTGCTAAGTGTGCTGCGAGACCTTGCCCTTGAATGGTTCACATTGGCCTGTCAGTATCGAGAAATACCAATATAATCAATAATCCATCTCACAATCCGTTCCTCCGCTATCTGAAGCTCTTGAACGCGCAATGACCTCCCATAGGATGCTTTCCTTCTCAGCATTCCATTCTATCTAGTTTTCAATGAGTTGGGTGAACGCTATCCAAACATGCTTGTGAATGGACCTACTCGTGGTGGGTTCTCGTAGCCTTCTTGTGGTCGATTGTACGGTAGACGGATGATAATGCGCACGGATGGTATCGCCGATGAAGAAGGCATTGCTAGCATTTACATGCAGGCCTTTCGAATGCCAGTGCTAACAGTGAAAGTTGCGTAATGTTACAAGCCCCGTGTCTCCGTGACTCCGGATTGCCTCACCCCTCCACACCACTCACAAATTGTCTTATTCTGCCTATGCCTCGGGCCTCAGCGCCGATTTTGGTCAAAGTAATCCGGCTTGCCTCAGCCTTTCGACCAAAATTTCGTCCCAAATTATACGGTTTCCTCCCAGACACGTACTGGTGGAATCGCCGGCGGGGATCCTTCCACATCCCACGTCGATCGTTCAACATCCCAACACGAACCACCATGACCGTGAGCTCGACTACTGCAGATGAGGGCGAGGAAACCAAAAACGATGCTCAAGAGCTCGACGAACTACTGGGGAATATGGCTCTCGATCCTGAAAATGAACAGGTCGTTTCAGAAATTGGGGGTGGGCGGAGCTTTCTCTCAAGCGACTATCCCGTGCCAATACAAGTTCTTGTGATCTCCCAGTGGTGCGATCTATCTATGAAGGCAGCAATGATGCAATAAGAGTCTGGAATCCGAACAATTCTGAGAGTACGAGTGCGAGCAACAACTCCGATGGCAAGGTAACGCGCTTCGAGGTTCACTTAAGGTACGCTTAATATTGTTTTCGTTGTGAACGATTGACCAACGTGTATACAGCTTATCGTCCTCACAAGACGGTCCCGAGAGCCGCTTCAAGGTCCTTGTATCGCTGCCACGAACATATCCATCTTCATCTCCGCCACAAATTCAGCTCCTGTCGCGTTACATCGGCGCGTTCAGTGTTGACGCAGACCTCTTCGGAGCAGTCATTCGTACATTCATCTCATCTAGAGATGGCGTTGAATGGCTTCCAGGTACAGAATGCATCTTCGATGGATTAGAGAACATCCGGGAACGCGTTGCTAAGTGGTACGACGAACGCCTTAGTGAAGAAAAGGCTCTGGAACTCGTAAGAGACGACGGAAAGGAAGGGACGCACGAAGACAAGCATCCGACCGACGAGATTGAATCTGTGGACAAATCCTCCAAGGGTCGCAGACCACAGGCTTCTCTGCCCGAGGGCATTGTTCTCCATGTATCGGAGCCTATCGTTGATCGGAAAAGTGTTTTTATAGGACGGGCGTGCCGAATATCTCATCCGTCTGAAGTATGCTTCAAGCTCCTTTTCTCTTTCGCGTTAAATTTTAACTCATCTGGTGTTTGTCCAGGTTGACTCCGTTTTATCGTATCTCGTCGCGGACCGAAAAATAGCTCGCGCTACCCATCCGGTTATAAATGCCTGGAGATGCAAAGTGAACGGAACACTCCACCAAGGTAAGGCTTGTTGCGTTTGTATCTGTCTAGTCCAATGTTGCTTAGATTAATTCTTCCACTCAGACAACGATGACAATGGAGAAAACGCCGCGGGGAGCCGCTTGGCTCACTTACTACGAATTTTGGTAAATGTTGCTCCTATGGACGCAGTCATCAATTCGCTCACTACGCATTTTGAAGGACGTTGATAATGTCCTTGTGATCGTCACTAGATCCTTCGGTGGCATCCGTTTGGGCCCCGACCGTTTCAAGCATATTAACCAAGCTGCTCGCAATGCTTTGGAGATAGGAGGATTCTTAGACGCACCAGATGATAAGAAGAATACCTCAAGGCCGAAAAAAAGACACTAAGATCATAGAATGGCTTCAAATACAGTGAAATGGTCAATAAGTAGTACAATGTTCGCGATCGAGCTAGCTCTGAAGTACTGTTTGTCACGTGGCCTGCTTGGAGAAAGATCACTGCTTGGGGCTGTTACACTGACACTGCATTCCCAACTCATGACTCCGTTCTGTGCTGTGTATTGCCGTCCACGTCGAAGCCACAGCTGAGATCGCTGACTCAGTATGATATGTAGGCTGTGACAAGTGTATGCATCTGGTTTGCTTCTTGTACGCATAAAACATCTAACGTGGCCAGGCAAGACAAACAGGTCATGTTTAACAACTCTTACCGCGTTCATGAGCTAGCTGGCAAACTCGACAATACATATCAAGATTCTCGCGAACCTCTGGTCCAGAAAAGCCTGGGTTCTGAGCCATGCCTTTTGGCGAGGCTGCCTTCAACCAGTGAGTATTTTCTCAGAGACATTCAGACCTGCAGAACGACGTCTCACGTTGCGTATAATTACCAGGTTTCTTGGTGCGAGAAGGAGAAGCTCTTGCTCCGAGGGCAGTTCTCAAGCAGCTAAGACGTAGTCCGTCACCACCGACGTCCCCGACGGATACTGATGAAGCCGAGAGACAAGAGCCAAGTAAGTCGAAGACACAGGAGACGGAAAGGAGATCATGGTTTTCTCCAAGAACTGTTAAACGTCCTCAGACCACGTGGAAAGAGCCTCAGGTTTGTACTTATGTGACTTCCGCAGAAGCAAGTGCCTAACAAACAGGGTGTTAATATAGTTATATGAGGTCTTTCGTGCAATTGAGCGGAAGGACATCATGTTTCTCATGGAGGTACGGGATCGAGCATTTCATGTAAGCACTTCTCAGGGCATTTTTATTTGCTATCTCTGACTTTATCGCAAGCTTTTACTCAAAAAGAGTGGGGATGCGACGCCACTCGTACACGCTATGCGGATAGGCGATTCACACCGTGACGTCGCAATTATCATTCTCGGTGCCTTGTCGCGATGGGTTAACCATTTGGAAGACAGCGACATGGCCGACAAGCGAACGAAACCGTTACTCAAAGCTTTGCGTGAGCCATCTCCACTTTACTTTTTACTGGTGAACATGTATATTCACAGACGAAGCAAGGCACCAATCTGAAACTCGCCGTTGACTATGGCCTGCAGCGCTCGCAATCGGACCTCATCCCTTCTTTCATGCAGACCCTGGTCATGAGTGAGGGTGAAAGATGGATTATCGATCAGACGCATAACGTGGCACTTGCACTTCGTGCTGGTACAGAAGGGAAACCTGTTCATACCGCTGAAACTGTTGTTAGGAAGTTCGCGACAAGGGAGCTTGGCAAGGCCGAGCTCATAGCGTCGTTAGAAGATTAGTAAGTGTTTAGCGCATTTATTCACGTTACCGACTCCTCAATGTTTGTCTGCAGCATAGCCAATGCCACTGCAGATCTGTTAGTCCTAGCCGCCTGCTCATGTGTCCTTGATTCTGTTCAAGCGGAACCTATTCCGGTGCGAGTCATTTATGACAGACTTCGGGTTTGATACTTACCATCGGCCAGACGTATTACTTTGCACGAGACACAAGAGTTTTCAATGCTTTCCAGGAACGTCTACAACATCACAAGGGGGCTCTGATGGGTCTTAGCAAACGCCTAAGGTGGCAGATTAGGGTCTTGGAGCACGTACTAGAAGGGCGGTTTAACTCATTCAGGGTATGCATTGCCTGTATTACAGATTATATGGGCTGATCCTTTCTTTTTTTTCTTTCCAGAAAAAGGTCGAGTTGTTGGCCTATGAGCTAGACGAGGGTCCAGGAGTATGATAATTCTGTGCTTTCAAGATCATAATTTTTTGGCCATTCTATGAGCAAGCTGACCGAGTTATTCTCCGTGCTGATTGATGTACACAGCACTAGTAAGAGTCTGAGAGCTCCCCAAAGAATTTTGCAATATCACGCTCGTATAAGGTGCCGCACGAGTAAAAGTTCTCAACTCGAAG
>93625_1:000408
MRRSPSPSTADADNTDYALELQDFLAELSQDPEREAVASEIQVLQSIYGDDAIRLWRPPLKNGKRSASTSRRDGTIRYEVLLSLSSPHDDVSLKVLVSLPETYPKSSPPQLQLLSKYIGSFGADANLFGSILRTYISVSGVEWLEDTVCVFDGLQNVLDRCVSWYEDRLSAEKAGELVRDDGKEAVAVSTRPVSPTGQTNAEISGIADSAPAPVPNALPIGIHIYVAEPITDRKSAFVGRACRIHHPSETRFMCAELFAFKVPLILSHLMSDRRISRAAHPIINAWRCQVDSVLHQGSSHNDDDGETAAGGRLAHLLQILEVNDVLVIVTRYFGGIHLGPDRFKHINQAARNALDLGGFLDAPENKKNTGRVKKH