Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
121 stars 4 forks source link

A gene that has a CDS with a different orientation in GALBA results #40

Closed suyeonwy closed 8 months ago

suyeonwy commented 11 months ago

Hi, I'm using GALBA for annotating a genome assembly for a pig. I used the following command line to execute GALBA and the whole process was finished without any critical error. galba.pl --genome=assembly.fasta --prot_seq=protein.fa --threads=20 --workingdir=./ &> GALBA.log

While I looked for the GALBA results in detail, I found a gene that has a CDS with a different orientation. Here is the following case.

chr10   AUGUSTUS        gene    2508725 2509003 0.14    +       .       g97
chr10   AUGUSTUS        transcript      2508725 2509003 0.14    +       .       g97.t1
chr10   AUGUSTUS        stop_codon      2486670 2486672 .       -       0       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        CDS     2486673 2486894 0.64    -       0       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        exon    2486673 2486894 .       -       .       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        start_codon     2486892 2486894 .       -       0       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        start_codon     2508725 2508727 .       +       0       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        CDS     2508725 2509000 0.14    +       0       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        exon    2508725 2509000 .       +       .       transcript_id "g97.t1"; gene_id "g97";
chr10   AUGUSTUS        stop_codon      2509001 2509003 .       +       0       transcript_id "g97.t1"; gene_id "g97";

I would like to know if this is a common result in gene annotation or a bug.

Thanks, Sue

KatharinaHoff commented 10 months ago

Can you tell me which version of GALBA you used? I am trying to figure out whether it is a TSEBRA or an AUGUSTUS issue.

suyeonwy commented 10 months ago

Hi, I used GALBA version 1.0.7.

Additionally, I found a case where the gene region was predicted to be 1 bp, and the elements (stop_codon, CDS, etc.) within the gene were expected to be located outside the gene region.

chr10   AUGUSTUS        gene    2524476 2524766 0.47    -       .       g98
chr10   AUGUSTUS        transcript      2524476 2524766 0.47    -       .       g98.t1
chr10   AUGUSTUS        stop_codon      2486956 2486958 .       -       0       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        CDS     2486959 2487504 0.46    -       0       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        exon    2486959 2487504 .       -       .       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        start_codon     2487502 2487504 .       -       0       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        stop_codon      2524476 2524478 .       -       0       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        CDS     2524479 2524766 0.47    -       0       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        exon    2524479 2524766 .       -       .       transcript_id "g98.t1"; gene_id "g98";
chr10   AUGUSTUS        start_codon     2524764 2524766 .       -       0       transcript_id "g98.t1"; gene_id "g98";

Could you also check this case, please? Thank you

Sue

KatharinaHoff commented 10 months ago

Can you please post the full GALBA log (from the workingdir)?

I see 3 possible causes: (a) it's in Augustus (unlikely), (b) it results from TSEBRA (possible in 1.0.7 in particular, that could be ruled out in 1.0.9), (c) it's in Pygustus. It's probably c, but I want to make sure that TSEBRA was not called.

suyeonwy commented 10 months ago

Here is the full log!

#**********************************************************************************
#                               G#*********
# WARNING: Detected whitespace in fasta header of file /mss5/Minipig/genome_annotation/coding/Sus_scrofa.Sscrofa11.1.pep.all.fa. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
warning: Coverage appears to be high, --ignoreCoverage flag will be ignored 
Intergenic size: 18586.8424203625
# IMPORTANT INFORMATION: the final output files 
of this GALBA run are galba.gtf, galba.codingseq, and galba.aa
These files are exact copies auf augustus.hints predictions.
For genomes with small intergenic region size, we found that 
in the majority of cases, this gene set is better than the TSEBRA gene set.
However, in rare cases, the tsebra gene set may be better.
You can generate a TSEBRA gene set yourself with the following command:
    tsebra.py -g augustus.hints.gtf -e hintsfile.gff -o tsebra
The accompanying fasta files can be generated with:
    getAnnofastaFromJoingenes.py -g genome.fa -f tsebra.gtf -o tsebra
g/genome_annotation/programs/Augustus/config/../bin
# Tue Sep 12 10:52:12 2023: Did not find environment variable $AUGUSTUS_SCRIPTS_PATH (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different way, later.
# Tue Sep 12 10:52:12 2023: Trying to guess $AUGUSTUS_SCRIPTS_PATH from default location of augustus scripts in debian.
# Tue Sep 12 10:52:12 2023: Trying to guess $AUGUSTUS_SCRIPTS_PATH from $AUGUSTUS_CONFIG_PATH.
# Tue Sep 12 10:52:12 2023: Setting $AUGUSTUS_SCRIPTS_PATH to /mss5/Minipig/genome_annotation/programs/Augustus/config/../scripts
# Tue Sep 12 10:52:12 2023: Did not find environment variable $PYTHON3_PATH
# Tue Sep 12 10:52:12 2023: Trying to guess $PYTHON3_PATH from location of python3 executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $PYTHON3_PATH to /usr/local/bin
# Tue Sep 12 10:52:12 2023: Did not find environment variable $DIAMOND_PATH
# Tue Sep 12 10:52:12 2023: Trying to guess $DIAMOND_PATH from location of diamond executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $DIAMOND_PATH to /mss5/Minipig/genome_annotation/programs/DIAMOND
# Tue Sep 12 10:52:12 2023: Did not find environment variable $MINIPROT_PATH (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different way, later.
# Tue Sep 12 10:52:12 2023: Trying to guess $MINIPROT_PATH from location of Miniprot executable in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $MINIPROT_PATH to /mss5/Minipig/genome_annotation/programs/miniprot
# Tue Sep 12 10:52:12 2023: Did not find environment variable $SCORER_PATH (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different way, later.
# Tue Sep 12 10:52:12 2023: Trying to guess $SCORER_PATH from location of miniprot_boundary_scorer executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $SCORER_PATH to /mss5/Minipig/genome_annotation/programs/miniprot-boundary-scorer
# Tue Sep 12 10:52:12 2023: Did not find environment variable $MINIPROTHINT_PATH (either variable does not exist, or the path given in variable does not exist). Will try to set this variable in a different way, later.
# Tue Sep 12 10:52:12 2023: Trying to guess $MINIPROTHINT_PATH from location of miniprothint.py executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $MINIPROTHINT_PATH to /mss5/Minipig/genome_annotation/programs/miniprothint
# Tue Sep 12 10:52:12 2023: Did not find environment variable $CDBTOOLS_PATH
# Tue Sep 12 10:52:12 2023: Trying to guess $CDBTOOLS_PATH from location of cdbfasta executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $CDBTOOLS_PATH to /mss5/Minipig/genome_annotation/programs/cdbfasta
# Tue Sep 12 10:52:12 2023: Did not find environment variable $TSEBRA_PATH
# Tue Sep 12 10:52:12 2023: Trying to guess $TSEBRA_PATH from location of tsebra.py executable that is available in your $PATH
# Tue Sep 12 10:52:12 2023: Setting $TSEBRA_PATH to /mss5/Minipig/genome_annotation/programs/TSEBRA/bin
#**********************************************************************************
#                               CREATING DIRECTORY STRUCTURE                       
#**********************************************************************************
# Tue Sep 12 10:52:13 2023: creating file that contains citations for this GALBA run at /mss5/Minipig/genome_annotation/coding/GALBA/what-to-cite.txt...
# Tue Sep 12 10:52:13 2023: create working directory /mss5/Minipig/genome_annotation/coding/GALBA/species
mkdir /mss5/Minipig/genome_annotation/coding/GALBA/species
# Tue Sep 12 10:52:13 2023: create working directory /mss5/Minipig/genome_annotation/coding/GALBA/errors
mkdir /mss5/Minipig/genome_annotation/coding/GALBA/errors
# Tue Sep 12 10:52:18 2023: changing into working directory /mss5/Minipig/genome_annotation/coding/GALBA
cd /mss5/Minipig/genome_annotation/coding/GALBA
# Tue Sep 12 10:52:18 2023: Creating parameter template files for AUGUSTUS with new_species.pl
# Tue Sep 12 10:52:18 2023: new_species.pl will create parameter files for species minipig in /mss5/Minipig/genome_annotation/programs/Augustus/config/species/minipig
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/new_species.pl --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config 1> /dev/null 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/new_species.stderr
# Tue Sep 12 10:52:19 2023: check_fasta_headers(): Checking fasta headers of file /mss5/Minipig/genome_assembly/final_assembly/minipig.chr.final.fasta
# Tue Sep 12 10:53:44 2023: check_fasta_headers(): Checking fasta headers of file /mss5/Minipig/genome_annotation/coding/Sus_scrofa.Sscrofa11.1.pep.all.fa
#*********
# WARNING: Detected whitespace in fasta header of file /mss5/Minipig/genome_annotation/coding/Sus_scrofa.Sscrofa11.1.pep.all.fa. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
# Tue Sep 12 10:53:44 2023: Assuming that this is not a DNA fasta file because other characters than A, T, G, C, N, a, t, g, c, n were contained. If this is supposed to be a DNA fasta file, check the content of your file! If this is supposed to be a protein fasta file, please ignore this message!
#**********************************************************************************
#            PROCESSING HINTS AND GENERATING TRAINING GENES                        
#**********************************************************************************
# Tue Sep 12 10:53:45 2023: Making protein hints
# Tue Sep 12 10:53:45 2023: Changing to /mss5/Minipig/genome_annotation/coding/GALBA
cd /mss5/Minipig/genome_annotation/coding/GALBA
# Tue Sep 12 10:53:45 2023: running Miniprot to produce protein to genome alignments, first producing genome index:
/mss5/Minipig/genome_annotation/programs/miniprot/miniprot -t20 -d /mss5/Minipig/genome_annotation/coding/GALBA/genome.mpi /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa> /mss5/Minipig/genome_annotation/coding/GALBA/miniprot_index.stdout 2>>/mss5/Minipig/genome_annotation/coding/GALBA/errors/miniprot.stderr
# Tue Sep 12 10:54:39 2023: running Miniprot to produce protein to genome alignments in aln format
/mss5/Minipig/genome_annotation/programs/miniprot/miniprot -I -ut20 --outn=1 --aln /mss5/Minipig/genome_annotation/coding/GALBA/genome.mpi /mss5/Minipig/genome_annotation/coding/Sus_scrofa.Sscrofa11.1.pep.all.fa >> /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.aln 2>> /mss5/Minipig/genome_annotation/coding/GALBA/errors/miniprot.stderr
# Tue Sep 12 10:59:13 2023: Alignments from file /mss5/Minipig/genome_annotation/coding/Sus_scrofa.Sscrofa11.1.pep.all.fa created.
# Tue Sep 12 10:59:13 2023: /mss5/Minipig/genome_annotation/programs/miniprot-boundary-scorer/miniprot_boundary_scorer -o /mss5/Minipig/genome_annotation/coding/GALBA/miniprot.gff -s /mss5/Minipig/genome_annotation/programs/miniprot-boundary-scorer/blosum62.csv < /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.aln 
# Tue Sep 12 11:01:42 2023: /mss5/Minipig/genome_annotation/programs/miniprothint/miniprothint.py /mss5/Minipig/genome_annotation/coding/GALBA/miniprot.gff --workdir /mss5/Minipig/genome_annotation/coding/GALBA --ignoreCoverage
# Tue Sep 12 11:03:42 2023: moving /mss5/Minipig/genome_annotation/coding/GALBA/miniprot.gtf to /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff
mv /mss5/Minipig/genome_annotation/coding/GALBA/miniprot.gtf /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff
# Tue Sep 12 11:03:42 2023: Converting alignments from file /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff to hints
# Tue Sep 12 11:03:42 2023: Converting protein alignment file /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff to hints for AUGUSTUS
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/GALBA/scripts/aln2hints.pl --in=/mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff --out=/mss5/Minipig/genome_annotation/coding/GALBA/prot_hintsfile.aln2hints.temp.gff --prg=miniprot --priority=4
# Tue Sep 12 11:03:49 2023: concatenating protein hints from /mss5/Minipig/genome_annotation/coding/GALBA/prot_hintsfile.aln2hints.temp.gff to /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff
cat /mss5/Minipig/genome_annotation/coding/GALBA/prot_hintsfile.aln2hints.temp.gff >> /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff
# Tue Sep 12 11:03:49 2023: Checking for hints of src=C and with grp tags that should not be joined according to multiplicity
# Tue Sep 12 11:03:50 2023: Joining hints that are identical (& from the same source) into multiplicity hints (input file /mss5/Minipig/genome_annotation/coding/GALBA/tmp_merge_hints.gff)
# Tue Sep 12 11:03:50 2023: sort hints of type prot
cat /mss5/Minipig/genome_annotation/coding/GALBA/tmp_merge_hints.gff | sort -n -k 4,4 | sort -s -n -k 5,5 | sort -s -n -k 3,3 | sort -s -k 1,1 >/mss5/Minipig/genome_annotation/coding/GALBA/hints.prot.temp.sort.gff
# Tue Sep 12 11:03:55 2023: join multiple hints
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/join_mult_hints.pl </mss5/Minipig/genome_annotation/coding/GALBA/hints.prot.temp.sort.gff >/mss5/Minipig/genome_annotation/coding/GALBA/tmp_merge_hints.gff 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/join_mult_hints.prot.stderr
mv /mss5/Minipig/genome_annotation/coding/GALBA/tmp_merge_hints.gff /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff
# Tue Sep 12 11:04:05 2023: moving /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff to /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.gff
mv /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.gff
Deleting /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.temp.gff
# Tue Sep 12 11:04:05 2023: joining hints files -> appending /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.gff to /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff
cat /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.gff >> /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff
# Tue Sep 12 11:04:05 2023: Deleting /mss5/Minipig/genome_annotation/coding/GALBA/prot_align_out.gff
# Tue Sep 12 11:04:05 2023: Moving /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff to /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.tmp.gff to enable sorting
mv /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.tmp.gff
# Tue Sep 12 11:04:05 2023: Sorting hints file /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff
cat /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.tmp.gff | sort -n -k 4,4 | sort -s -n -k 5,5 | sort -s -n -k 3,3 | sort -s -k 1,1 > /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff
# Tue Sep 12 11:04:06 2023: Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.tmp.gff
rm /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.tmp.gff
#  Tue Sep 12 11:04:06 2023: selecting training genes from miniprot output /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.gtf.
find_train_candidates.py -m /mss5/Minipig/genome_annotation/coding/GALBA/protein_alignment_miniprot.gff -o /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.gtf
# Tue Sep 12 11:04:20 2023: rm /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff.
#**********************************************************************************
#                            TRAINING AUGUSTUS                                     
#**********************************************************************************
# Tue Sep 12 11:04:23 2023: training AUGUSTUS
# Tue Sep 12 11:04:23 2023: Converting gtf file /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.gtf to genbank file
# Tue Sep 12 11:04:23 2023: Computing flanking region size for AUGUSTUS training genes
# Tue Sep 12 11:04:24 2023: create genbank file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/gff2gbSmallDNA.pl /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.gtf /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa 10000 /mss5/Minipig/genome_annotation/coding/GALBA/train.gb 1> /mss5/Minipig/genome_annotation/coding/GALBA/gff2gbSmallDNA.stderr 2>/mss5/Minipig/genome_annotation/coding/GALBA/gff2gbSmallDNA.stderr
#*********
# INFORMATION: the size of flanking region used in this GALBA run is 10000
#*********
# Tue Sep 12 11:09:14 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb contains 19152 genes.
# Tue Sep 12 11:09:14 2023: Setting value of "stopCodonExcludedFromCDS" in /mss5/Minipig/genome_annotation/programs/Augustus/config/species/minipig/minipig_parameters.cfg to "true"
# Tue Sep 12 11:09:14 2023: Running etraining to catch gene structure inconsistencies:
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/etraining --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train.gb 1> /mss5/Minipig/genome_annotation/coding/GALBA/gbFilterEtraining.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/gbFilterEtraining.stderr
# Tue Sep 12 11:11:09 2023: Filtering /mss5/Minipig/genome_annotation/coding/GALBA/train.gb file to remove inconsistent gene structures...
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/filterGenes.pl /mss5/Minipig/genome_annotation/coding/GALBA/etrain.bad.lst /mss5/Minipig/genome_annotation/coding/GALBA/train.gb 1> /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/etrainFilterGenes.stderr
# Tue Sep 12 11:11:15 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb contains 12073 genes.
# Tue Sep 12 11:11:15 2023: Reducing number of training genes by random selection to 8000.
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/randomSplit.pl /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb 8000 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit_8000.stderr
mv /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb.test /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb
# Tue Sep 12 11:11:38 2023: BLAST or DIAMOND training gene structures against themselves:
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/aa2nonred.pl /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.good.fa /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.good.nr.fa --DIAMOND_PATH=/mss5/Minipig/genome_annotation/programs/DIAMOND --cores=20 --diamond 1> /mss5/Minipig/genome_annotation/coding/GALBA/aa2nonred.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/aa2nonred.stderr
# Tue Sep 12 11:11:41 2023: Filtering nonredundant loci into /mss5/Minipig/genome_annotation/coding/GALBA/train.fff.gb:
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/filterGenesIn.pl /mss5/Minipig/genome_annotation/coding/GALBA/nonred.loci.lst /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb 1> /mss5/Minipig/genome_annotation/coding/GALBA/train.fff.gb 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/filterGenesIn.stderr
# Tue Sep 12 11:11:44 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.fff.gb contains 7722 genes.
# Tue Sep 12 11:11:44 2023: Moving /mss5/Minipig/genome_annotation/coding/GALBA/train.fff.gb to /mss5/Minipig/genome_annotation/coding/GALBA/train.gb:
mv /mss5/Minipig/genome_annotation/coding/GALBA/train.fff.gb /mss5/Minipig/genome_annotation/coding/GALBA/train.gb
# Tue Sep 12 11:11:45 2023: Splitting genbank file into train and test file
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/randomSplit.pl /mss5/Minipig/genome_annotation/coding/GALBA/train.gb 300 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit.stderr
# Tue Sep 12 11:11:49 2023: /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test will be used for measuring AUGUSTUS accuracy after training
# Tue Sep 12 11:11:49 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test contains 300 genes.
# Tue Sep 12 11:11:51 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train contains 7422 genes.
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/randomSplit.pl /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train 300 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit.stderr
# Tue Sep 12 11:11:56 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.train contains 7122 genes.
# Tue Sep 12 11:11:57 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.test contains 300 genes.
# Tue Sep 12 11:11:57 2023: /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.test will be used or measuring AUGUSTUS accuracy during training with optimize_augustus.pl
 /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.train will be used for running etraining in optimize_augustus.pl (together with train.gb.train.test)
 /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train will be used for running etraining (outside of optimize_augustus.pl)
# Tue Sep 12 11:11:57 2023: first etraining
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/etraining --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train 1>/mss5/Minipig/genome_annotation/coding/GALBA/firstetraining.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/firstetraining.stderr
grep -c "exon doesn't end in stop codon" /mss5/Minipig/genome_annotation/coding/GALBA/errors/firstetraining.stderr
# Tue Sep 12 11:12:39 2023: Error rate of missing stop codon is 0
# Tue Sep 12 11:12:39 2023: Adjusting stop-codon frequencies in species_parameters.cfg according to /mss5/Minipig/genome_annotation/coding/GALBA/firstetraining.stdout
# Tue Sep 12 11:12:39 2023: Setting frequency of stop codons to tag=0.231, taa=0.268, tga=0.501.
# Tue Sep 12 11:12:39 2023: first AUGUSTUS accuracy test
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/augustus --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test >/mss5/Minipig/genome_annotation/coding/GALBA/firsttest.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/firsttest.stderr
# Tue Sep 12 11:35:38 2023: Computing accuracy of AUGUSTUS prediction (in test file derived from predictions on training data set stored in /mss5/Minipig/genome_annotation/coding/GALBA/firsttest.stdout)
# Tue Sep 12 11:35:38 2023: The accuracy in round first is 0.4497
# Tue Sep 12 11:35:38 2023: RUNNING AUGUSTUS
# Tue Sep 12 11:35:38 2023: AUGUSTUS with hints
# Tue Sep 12 11:35:38 2023: copy extrinsic file /mss5/Minipig/genome_annotation/programs/GALBA/scripts/cfg/galba.cfg to working directory
cp /mss5/Minipig/genome_annotation/programs/GALBA/scripts/cfg/galba.cfg /mss5/Minipig/genome_annotation/coding/GALBA/species/minipig/ex1.cfg
/usr/local/bin/python3 /mss5/Minipig/genome_annotation/coding/GALBA/pygustus_hints.py 1> /mss5/Minipig/genome_annotation/coding/GALBA/pygustus_hints.out 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/pygustus_hints.err
# Tue Sep 12 16:56:34 2023: AUGUSTUS prediction complete
# Tue Sep 12 16:56:35 2023: Making a gtf file from /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gff
cat /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gff | /usr/bin/perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | /usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/gtf2gff.pl --printExon --out=/mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.tmp.gtf 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/gtf2gff.augustus.hints.gtf.stderr
# Tue Sep 12 16:56:55 2023: Converting gtf file /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf to genbank file
# Tue Sep 12 16:56:55 2023: create genbank file /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/gff2gbSmallDNA.pl --good /mss5/Minipig/genome_annotation/coding/GALBA/good_training_genes.lst /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa 10000 /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb 1> /mss5/Minipig/genome_annotation/coding/GALBA/gff2gbSmallDNA.stderr 2>/mss5/Minipig/genome_annotation/coding/GALBA/gff2gbSmallDNA.stderr
#*********
# INFORMATION: the size of flanking region used in this GALBA run is 10000
#*********
# Tue Sep 12 16:59:27 2023: Splitting genbank file into train and test file
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/randomSplit.pl /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb 300 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit.stderr
# Tue Sep 12 16:59:33 2023: /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.test will be used for measuring AUGUSTUS accuracy after training
# Tue Sep 12 16:59:33 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.test contains 300 genes.
# Tue Sep 12 16:59:35 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train contains 13082 genes.
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/randomSplit.pl /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train 300 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit.stderr
# Tue Sep 12 16:59:42 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.train contains 12782 genes.
# Tue Sep 12 16:59:42 2023: Genbank format file /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.test contains 300 genes.
# Tue Sep 12 16:59:42 2023: /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.test will be used or measuring AUGUSTUS accuracy during training with optimize_augustus.pl
 /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.train will be used for running etraining in optimize_augustus.pl (together with train2.gb.train.test)
 /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train will be used for running etraining (outside of optimize_augustus.pl)
# Tue Sep 12 16:59:42 2023: first etraining
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/etraining --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train 1>/mss5/Minipig/genome_annotation/coding/GALBA/firstetraining.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/firstetraining.stderr
# Tue Sep 12 17:00:51 2023: second AUGUSTUS accuracy test
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/augustus --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test >/mss5/Minipig/genome_annotation/coding/GALBA/secondtest.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/secondtest.stderr
# Tue Sep 12 17:22:22 2023: Computing accuracy of AUGUSTUS prediction (in test file derived from predictions on training data set stored in /mss5/Minipig/genome_annotation/coding/GALBA/secondtest.stdout)
# Tue Sep 12 17:22:22 2023: The accuracy in round second is 0.4701
# Tue Sep 12 17:22:22 2023: optimizing AUGUSTUS parameters
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/optimize_augustus.pl --aug_exec_dir=/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin --rounds=5 --species=minipig --kfold=20 --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config --onlytrain=/mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.train --cpus=20 /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train.test 1>/mss5/Minipig/genome_annotation/coding/GALBA/optimize_augustus.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/optimize_augustus.stderr
# Fri Sep 15 07:13:33 2023:  parameter optimization finished.
# Fri Sep 15 07:13:33 2023: Third etraining
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/etraining --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train2.gb.train 1>/mss5/Minipig/genome_annotation/coding/GALBA/secondetraining.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/secondetraining.stderr
# Fri Sep 15 07:17:45 2023: third AUGUSTUS accuracy test
/mss5/Minipig/genome_annotation/programs/Augustus/config/../bin/augustus --species=minipig --AUGUSTUS_CONFIG_PATH=/mss5/Minipig/genome_annotation/programs/Augustus/config /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test >/mss5/Minipig/genome_annotation/coding/GALBA/thirdtest.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/thirdtest.stderr
# Fri Sep 15 07:43:29 2023: Computing accuracy of AUGUSTUS prediction (in test file derived from predictions on training data set stored in /mss5/Minipig/genome_annotation/coding/GALBA/thirdtest.stdout)
# Fri Sep 15 07:43:29 2023: The accuracy in round third is 0.534693333333333
#**********************************************************************************
#                               PREDICTING GENES WITH AUGUSTUS (NO UTRS)           
#**********************************************************************************
# Fri Sep 15 07:43:30 2023: RUNNING AUGUSTUS
# Fri Sep 15 07:43:30 2023: AUGUSTUS with hints
# Fri Sep 15 07:43:33 2023: copy extrinsic file /mss5/Minipig/genome_annotation/programs/GALBA/scripts/cfg/galba.cfg to working directory
cp /mss5/Minipig/genome_annotation/programs/GALBA/scripts/cfg/galba.cfg /mss5/Minipig/genome_annotation/coding/GALBA/species/minipig/ex1.cfg
/usr/local/bin/python3 /mss5/Minipig/genome_annotation/coding/GALBA/pygustus_hints.py 1> /mss5/Minipig/genome_annotation/coding/GALBA/pygustus_hints.out 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/pygustus_hints.err
# Fri Sep 15 23:11:53 2023: AUGUSTUS prediction complete
# Fri Sep 15 23:11:53 2023: Making a gtf file from /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gff
cat /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gff | /usr/bin/perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | /usr/bin/perl /mss5/Minipig/genome_annotation/programs/Augustus/scripts/gtf2gff.pl --printExon --out=/mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.tmp.gtf 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/gtf2gff.augustus.hints.gtf.stderr
# Fri Sep 15 23:12:18 2023: Making a fasta file with protein sequences of /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf
/usr/local/bin/python3 /mss5/Minipig/genome_annotation/programs/Augustus/scripts/getAnnoFastaFromJoingenes.py -g /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa -f /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf -o /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints 1> /mss5/Minipig/genome_annotation/coding/GALBA/getAnnoFastaFromJoingenes.augustus.hints_tmp.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/getAnnoFastaFromJoingenes.augustus.hints_tmp.stderr
# Fri Sep 15 23:14:10 2023: fixing AUGUSTUS genes with in frame stop codons...
/usr/local/bin/python3 /mss5/Minipig/genome_annotation/programs/Augustus/scripts/fix_in_frame_stop_codon_genes.py -g /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa -t /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf -b /mss5/Minipig/genome_annotation/coding/GALBA/bad_genes.lst -o augustus.hints_fix_ifs_ -s minipig -m off --UTR off --print_utr off -a /mss5/Minipig/genome_annotation/programs/Augustus/config -C /mss5/Minipig/genome_annotation/programs/cdbfasta -A /mss5/Minipig/genome_annotation/programs/Augustus/config/../bin -S /mss5/Minipig/genome_annotation/programs/Augustus/config/../scripts -H /mss5/Minipig/genome_annotation/coding/GALBA/hintsfile.gff -e /mss5/Minipig/genome_annotation/programs/GALBA/scripts/cfg/galba.cfg  > /mss5/Minipig/genome_annotation/coding/GALBA/fix_in_frame_stop_codon_genes_augustus.hints.log 2> /mss5/Minipig/genome_annotation/coding/GALBA/errors/fix_in_frame_stop_codon_genes_augustus.hints.err
# Sat Sep 16 00:07:04 2023: Moving gene prediction file without in frame stop codons to location of original file (overwriting it)...
mv /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints_fix_ifs_.gtf /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf
# Sat Sep 16 00:07:04 2023: Deleting file with genes with in frame stop codons...
rm /mss5/Minipig/genome_annotation/coding/GALBA/bad_genes.lst
# Sat Sep 16 00:07:04 2023: Making a fasta file with protein sequences of /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf
/usr/local/bin/python3 /mss5/Minipig/genome_annotation/programs/Augustus/scripts/getAnnoFastaFromJoingenes.py -g /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa -f /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf -o /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints 1> /mss5/Minipig/genome_annotation/coding/GALBA/getAnnoFastaFromJoingenes.augustus.hints_hints.stdout 2>/mss5/Minipig/genome_annotation/coding/GALBA/errors/getAnnoFastaFromJoingenes.augustus.hints_hints.stderr
mv /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.gtf /mss5/Minipig/genome_annotation/coding/GALBA/galba.gtf
mv /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.aa /mss5/Minipig/genome_annotation/coding/GALBA/galba.aa
mv /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.codingseq /mss5/Minipig/genome_annotation/coding/GALBA/galba.codingseq
# IMPORTANT INFORMATION: the final output files 
of this GALBA run are galba.gtf, galba.codingseq, and galba.aa
These files are exact copies auf augustus.hints predictions.
For genomes with small intergenic region size, we found that 
in the majority of cases, this gene set is better than the TSEBRA gene set.
However, in rare cases, the tsebra gene set may be better.
You can generate a TSEBRA gene set yourself with the following command:
    tsebra.py -g augustus.hints.gtf -e hintsfile.gff -o tsebra
The accompanying fasta files can be generated with:
    getAnnofastaFromJoingenes.py -g genome.fa -f tsebra.gtf -o tsebra
# Sat Sep 16 00:09:00 2023: deleting empty files
# Sat Sep 16 00:09:00 2023: find /mss5/Minipig/genome_annotation/coding/GALBA -empty
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/find_python3_re.err
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/find_python3_biopython.err
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/find_python3_pygustus.err
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/new_species.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/join_mult_hints.prot.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/etrainFilterGenes.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit_8000.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/filterGenesIn.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/randomSplit.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/firstetraining.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/firsttest.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/pygustus_hints.err
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/gtf2gff.augustus.hints.gtf.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/secondtest.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/optimize_augustus.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/secondetraining.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/thirdtest.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/getAnnoFastaFromJoingenes.augustus.hints_tmp.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/fix_in_frame_stop_codon_genes_augustus.hints.err
rm /mss5/Minipig/genome_annotation/coding/GALBA/errors/getAnnoFastaFromJoingenes.augustus.hints_hints.stderr
rm /mss5/Minipig/genome_annotation/coding/GALBA/miniprot_index.stdout
rm /mss5/Minipig/genome_annotation/coding/GALBA/highAlIntrons.gff
rm /mss5/Minipig/genome_annotation/coding/GALBA/tmp_no_merge_hints.gff
rm /mss5/Minipig/genome_annotation/coding/GALBA/fix_in_frame_stop_codon_genes_augustus.hints.log
rm /mss5/Minipig/genome_annotation/coding/GALBA/getAnnoFastaFromJoingenes.augustus.hints_hints.stdout
# Sat Sep 16 00:09:01 2023: deleting job lst files (if existing)
rm /mss5/Minipig/genome_annotation/coding/GALBA/prot_hintsfile.aln2hints.temp.gff
rm /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.gtf
rm /mss5/Minipig/genome_annotation/coding/GALBA/gbFilterEtraining.stdout
rm /mss5/Minipig/genome_annotation/coding/GALBA/etrain.bad.lst
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.ff.gb.train
rm /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.good.fa
rm /mss5/Minipig/genome_annotation/coding/GALBA/aa2nonred.stdout
rm /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.good.nr.fa
rm /mss5/Minipig/genome_annotation/coding/GALBA/nonred.loci.lst
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.test
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.train
rm /mss5/Minipig/genome_annotation/coding/GALBA/train.gb.train.test
rm /mss5/Minipig/genome_annotation/coding/GALBA/firstetraining.stdout
rm /mss5/Minipig/genome_annotation/coding/GALBA/good_training_genes.lst
rm /mss5/Minipig/genome_annotation/coding/GALBA/augustus.hints.tmp.gtf
rm /mss5/Minipig/genome_annotation/coding/GALBA/secondetraining.stdout
rm /mss5/Minipig/genome_annotation/coding/GALBA/fix_IFS_log_rrvucdsm
/usr/bin/perl /mss5/Minipig/genome_annotation/programs/GALBA/scripts/galba_cleanup.pl --wdir=/mss5/Minipig/genome_annotation/coding/GALBA
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/firsttest.stdout
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/secondtest.stdout
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/train.gb
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/genome.fa.cidx
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/getAnnoFastaFromJoingenes.augustus.hints_tmp.stdout
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/traingenes.good.gtf
Deleting file /mss5/Minipig/genome_annotation/coding/GALBA/genome.mpi
#**********************************************************************************
#                               GALBA RUN FINISHED                                
#**********************************************************************************
KatharinaHoff commented 8 months ago

Thank you for providing this information.

@MarioStanke since I don't know where these weird gene structures are coming from, I intend to write a sanity check for the output that removes these problems post AUGUSTUS. (I strongly suspect it comes from Pygustus...)

MarioStanke commented 8 months ago

I am sure it does not come from augustus, but that Bug must come from a downstream component. pygustus is a hot candidate.

KatharinaHoff commented 8 months ago

The problem is temporarily solved within GALBA (weird transcripts are automatically deleted). I will close here, an issue at Pygustus has been placed for later.