NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
467 stars 56 forks source link

Empty gff output when running agat_sp_compare_two_BUSCOs.pl #320

Closed smorzechowski closed 1 year ago

smorzechowski commented 1 year ago

Hello, thanks for the useful tools in this package! I am running the agat_sp_compare_two_BUSCOs.pl script. The text files contain content but the gff files are empty when the job is completed on my cluster. Output files: image

There are a lot of warnings regarding gff formatting in the out file, which I will attach here! compare_busco38607495.out.log

Thousands of log files were generated; will attach an example as well. 51217at8782.out.agat.log

I am using the singularity image agat_1.0.0--pl5321hdfd78af_0.sif.

sbatch submission script: Folder 1 contains my genome BUSCO run. Folder 2 contains my annotation protein BUSCO run.

#!/bin/sh
#SBATCH -n 1                # Number of cores
#SBATCH -N 1                # Ensure that all cores are on one machine
#SBATCH -t 0-08:00          # Runtime in D-HH:MM, minimum of 10 minutes
#SBATCH -p shared,test      # Partition to submit to
#SBATCH --mem=50G           # Memory pool for all cores (see also --mem-per-cpu)
#SBATCH -o log/compare_busco%j.out   # File to which STDOUT will be written, %j inserts jobid
#SBATCH -e log/compare_busco%j.err   # File to which STDERR will be written, %j inserts jobid
#SBATCH --account=oeb275r

SMEL='/n/holyscratch01/edwards_lab/smorzechowski/meliphagid/analysis/2022-12-21/'

singularity exec --cleanenv /n/home09/smorzechowski/bin/agat_1.0.0--pl5321hdfd78af_0.sif \
agat_sp_compare_two_BUSCOs.pl --f1 $SMEL/07-busco/run_cyan_flye_NP_assembly_incl_addedZandW_v2_rm1kb \
--f2 $SMEL/07-busco/run_etpmode_unmasked_tsebra_pb1_braker -o genome_to_etpmode_unm_ts_pb1

End of out file: image

Example of warnings in out file: image

Juke34 commented 1 year ago

Hi,

The warnings are OK, it is because predictions are made by Augustus and Augustus does not follow strictly the GFF format. But it's fine AGAT knows how to deal with that. You do not have any GFF output because AGAT didn't finish the job, and quit because I guess one of the prediction file seems to be empty (0 column). I would need your BUSCO output to debug properly.

smorzechowski commented 1 year ago

Hi, thanks for getting back so quickly! I inspected the BUSCO output, and am attaching the files from my augustus_output/predicted_genes directory that led to the termination of my agat run!

There were two out files 51217at8782.out.1 and 51217at8782.out.2. The out.2 did not contain any predicted genes, which is why the agat parser realized that it could not be a gff file -- and that's when the agat run was terminated.

51217at8782.out.1.txt 51217at8782.out.2.txt

I see lots of out.2 and out.3 files in my predicted_genes dir, but at first glance the others I checked do contain gene predictions so I'm not sure why this particular out.2 file was different. I wish I could give you access to my busco directory since it's too heavy to copy, but let me know if these additional files help with debugging at all!

Juke34 commented 1 year ago

Could you send me the whole folder zipped via Wetransfer?

Juke34 commented 1 year ago

Without the bunco directory I'm afraid I cannot help.