Closed EmilieSmeets22 closed 6 months ago
Hi @EmilieSmeets22,
I'm not sure to understand your question:
I would like to keep the original GFF features IDs
... but ... original IDs are kept
Something is not clear here. Please clarify.
The ID attribute contains uniq identifiers used by the GFF formation establish relationships between features (using Parent/ID relationships). It is not supposed to be changed excepted to make it clearer (but no functional information is set in this attribute).
As explained in the help of agat_sp_manage_functional_annotation.pl
>The blast against Protein Database (outfmt 6) allows to fill the field/attribute
NAME for gene and PRODUCT for mRNA.
>The Interpro result (.tsv) file allows to fill the DBXREF field/attribute with
pfam, tigr, interpro, GO, KEGG, etc... terms data.
With the <id> option the script will change all the ID field by an Uniq ID
created from the given prefix, a letter to specify the kind of feature (G,T,C,E,U),
and the feature number.
Hi @Juke34
I think this issue should actually be under GAAS and not AGAT, my bad!
The GAAS functional annotation pipeline, at the step merge annotation, returns a GFF file with new gene IDs.
[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
withName: 'INTERPROSCAN' {
cpus = 20
memory = 300.GB
ext.args = [
'--iprlookup',
'--goterms',
'-t p',
'-dra',
'-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
].join(" ").trim()
}
withName: 'BLAST_BLASTP' {
ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
}
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt
N E X T F L O W ~ version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]
_ _ ___ ___ ___
| \| | _ )_ _/ __|
| .` | _ \| |\__ \
|_|\_|___/___|___/ Annotation Service
Functional annotation workflow
===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔
Workflow completed successfully.
Thank you for using our workflow.
Results are located in the folder: ~/output/20240408_chr01
Completed at: 08-Apr-2024 16:57:29
Duration : 10m 10s
CPU hours : 5.9
Succeeded : 15
[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01 maker gene 24795 31012 . - . ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01 maker mRNA 24795 31012 . - . ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01 maker exon 24795 24945 . - . ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01 maker exon 26435 26604 . - . ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01 maker exon 27851 27929 . - . ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01 maker exon 28302 28423 . - . ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01 maker exon 30953 31012 . - . ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01 maker CDS 24795 24945 . - 1 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01 maker CDS 26435 26604 . - 0 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01 maker mRNA 24795 31012 . - . ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01 maker mRNA 33922 37446 . + . ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01 exonerate mRNA 45536 50728 . - . ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01 exonerate mRNA 90633 141688 . - . ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01 maker mRNA 145015 147063 . + . ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01 exonerate mRNA 164172 286235 . + . ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01 exonerate mRNA 395432 509234 . - . ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01 exonerate mRNA 534035 534211 . - . ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01 maker mRNA 639615 642189 . + . ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01 transdecoder mRNA 655131 661114 . + . ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0
Here is the input GFF file with original features IDs:
[doutree@by3acs] input $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01 maker gene 24795 31012 . - . ID=DcarChr1G00000010
chr01 maker mRNA 24795 31012 . - . ID=DcarChr1G00000010.1;Parent=DcarChr1G00000010
chr01 maker exon 24795 24945 . - . ID=nbis-exon-1;Parent=DcarChr1G00000010.1
chr01 maker exon 26435 26604 . - . ID=nbis-exon-2;Parent=DcarChr1G00000010.1
chr01 maker exon 27851 27929 . - . ID=nbis-exon-3;Parent=DcarChr1G00000010.1
chr01 maker exon 28302 28423 . - . ID=nbis-exon-4;Parent=DcarChr1G00000010.1
chr01 maker exon 30953 31012 . - . ID=nbis-exon-5;Parent=DcarChr1G00000010.1
chr01 maker CDS 24795 24945 . - 1 ID=cds-5;Parent=DcarChr1G00000010.1
chr01 maker CDS 26435 26604 . - 0 ID=cds-4;Parent=DcarChr1G00000010.1
I do not understand this behavior as I would have indeed expected that original features IDs would be kept - in the same way as when agat_sp_manage_functional_annotation.pl is run.
Thanks!
It is because the Functional annotation workflow you use from https://github.com/NBISweden/pipelines-nextflow use the --id
parameter of the agat_sp_manage_functional_annotation.pl
script. You have to clone, modify the pipeline (remove line 30 in (pipelines-nextflow/functional_annotation_modules.config) and run your local version.
Thank you for the quick answer. Is that correct that there is no parameter in the Functional annotation workflow to not use the --id parameter?
There is no command line parameter for it but as I pointed above you can modify this parameter by removing line 30 of the pipelines-nextflow/functional_annotation_modules.config
file.
Hi,
I would like to keep the original GFF features IDs after the merge functional annotations step. For example I run manually AGAT's agat_sp_manage_functional_annotation.pl and by default original IDs are kept.
agat_sp_manage_functional_annotation.pl -f Daucus_carota.gene_chr_AGAT.gff -b functional_annotation/blast_tsv/blast_merged.tsv -d uniprot_sprot.fasta -i functional_annotation/interproscan_tsv/interproscan_merged.tsv -o functional_annotation/AGAT_manual
Could you please indicate me which parameter/argument I missed?
Thank you in advance Emilie