Closed Vijithkumar2020 closed 1 week ago
Check manually the reported cases to see if location in the gff is really out of the corresponding sequence size of the fasta file.
Thank you for the response. The *.gff file is too huge to identify the error case manually. Is there any recommended gff tool that will isolate a single case based on the seq_id?
Use grep or awk to extract the features from SVA1_S1_L008_001_contig_737055 sequence. Then Check the higher position value.
Okay, I want to add one more thing: AUGUSTUS was run on multiple split FASTA files (the parent FASTA file was split into smaller files) as parallel jobs. So, 8 individual GFF files were later concatenated to generate the combined GFF. While running AGAT, I specified the original FASTA file. Could this have raised any issues?
It might depending how you merged the different annotation, because same gene name may have been used in the different files. In that case AGAT may have messed up the annotation linking genes with same name as a unique record. The best is to check manually the reported cases by AGAT
Thank you for pointing that out. Yes, you're correct that the same gene name was used (e.g., 'g1' appeared in multiple instances), but still, all the seq_ids are unique. Anyway, I will manually check the reported case to narrow it down.
To avoid issue related to shared names between file you can use the agat script for the purpose. It will handle names and updates then on the fly to become unique in the final merged file.
Are you referring to theagat_sp_merge_annotations.pl
? I mean I can use all the individual *.GFFs and merge them using this tool.
Yes exactly
AGAT was run on docker. The following biocontainer was used from quay.io: 1.4.1--pl5321hdfd78af_0. AUGUSTUS predicted ~49500 coding genes, but AGAT extracted only ~15,000.
The program was run as follows: ``` sudo docker run -v /media/e3349969-3452-4c3a-9b3f-d3931278e4a5:/data \ quay.io/biocontainers/agat:0.8.0--pl5262hdfd78af_0 \ agat_sp_extract_sequences.pl \ -g /data/AUGUSTUS_cds_zea_mays_ref/combined_output.gff3 \ -f /data/contig_out_file.fasta.masked \ -o /data/AUGUSTUS_cds_zea_mays_ref/cds_maize-ref/cds.fasta \ -t cds