NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
431 stars 52 forks source link

Functional annotation, merge annotations: how to keep original IDs #453

Closed EmilieSmeets22 closed 2 months ago

EmilieSmeets22 commented 2 months ago

Hi,

I would like to keep the original GFF features IDs after the merge functional annotations step. For example I run manually AGAT's agat_sp_manage_functional_annotation.pl and by default original IDs are kept.

agat_sp_manage_functional_annotation.pl -f Daucus_carota.gene_chr_AGAT.gff -b functional_annotation/blast_tsv/blast_merged.tsv -d uniprot_sprot.fasta -i functional_annotation/interproscan_tsv/interproscan_merged.tsv -o functional_annotation/AGAT_manual

Could you please indicate me which parameter/argument I missed?

Thank you in advance Emilie

Juke34 commented 2 months ago

Hi @EmilieSmeets22,

I'm not sure to understand your question: I would like to keep the original GFF features IDs ... but ... original IDs are kept Something is not clear here. Please clarify.

The ID attribute contains uniq identifiers used by the GFF formation establish relationships between features (using Parent/ID relationships). It is not supposed to be changed excepted to make it clearer (but no functional information is set in this attribute).

As explained in the help of agat_sp_manage_functional_annotation.pl

>The blast against Protein Database (outfmt 6) allows to fill the field/attribute
NAME for gene and PRODUCT for mRNA.

>The Interpro result (.tsv) file allows to fill the DBXREF field/attribute with
pfam, tigr, interpro, GO, KEGG, etc... terms data.

With the <id> option the script will change all the ID field by an Uniq ID
created from the given prefix, a letter to specify the kind of feature (G,T,C,E,U),
and the feature number.
EmilieSmeets22 commented 2 months ago

Hi @Juke34

I think this issue should actually be under GAAS and not AGAT, my bad!

The GAAS functional annotation pipeline, at the step merge annotation, returns a GFF file with new gene IDs.

[doutree@plop] $ module load Nextflow
[doutree@plop] $ cat ~/workspace/GFF/functional_annotation_param_chr01.yml
subworkflow: 'functional_annotation'
genome: '~/input/DAUCA_Kuroda_chr01.fa'
gff_annotation: '~/input/Daucus_carota.gene_chr_AGAT_chr01.gff'
blast_db_fasta: '~/input/uniprot_sprot.fasta'
outdir: '~/output/20240408_chr01'
[doutree@plop] $ cat ~/workspace/GFF/custom_config_chr01.txt
process {
    withName: 'INTERPROSCAN' {
        cpus     = 20
        memory   = 300.GB
        ext.args = [
            '--iprlookup',
            '--goterms',
            '-t p',
            '-dra',
            '-appl TIGRFAM,FunFam,SFLD,PANTHER,Gene3D,Hamap,Coils,SMART,CDD,PRINTS,PIRSR,AntiFam,Pfam'
        ].join(" ").trim()
    }
    withName: 'BLAST_BLASTP' {
        ext.args = '-max_target_seqs 1 -evalue 1e-6 -outfmt 6'
    }
}
[doutree@plop] $ nextflow run NBISweden/pipelines-nextflow -profile conda -params-file functional_annotation_param_chr01.yml -c custom_config_chr01.txt

N E X T F L O W  ~  version 22.10.1
Launching \`https://github.com/NBISweden/pipelines-nextflow\` [maniac_spence] DSL2 - revision: 5f66ae3cf2
[master]

         _  _ ___ ___ ___
        | \| | _ )_ _/ __|
        | .` | _ \| |\__ \
        |_|\_|___/___|___/ Annotation Service

        Functional annotation workflow
        ===================================================
[f9/834f23] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 1 of 1 ✔
[15/7856b3] process > FUNCTIONAL_ANNOTATION:GFF2P... [100%] 1 of 1 ✔
[3a/fbbf8e] process > FUNCTIONAL_ANNOTATION:BLAST... [100%] 6 of 6 ✔
[49/81bd45] process > FUNCTIONAL_ANNOTATION:INTER... [100%] 6 of 6 ✔
[b0/e757df] process > FUNCTIONAL_ANNOTATION:MERGE... [100%] 1 of 1 ✔

        Workflow completed successfully.

        Thank you for using our workflow.
        Results are located in the folder: ~/output/20240408_chr01

Completed at: 08-Apr-2024 16:57:29
Duration    : 10m 10s
CPU hours   : 5.9
Succeeded   : 15

[doutree@plop] $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   exon    24795   24945   .       -       .       ID=NBISE00000000001;Parent=NBISM00000000001;makerName=nbis-exon-1
chr01   maker   exon    26435   26604   .       -       .       ID=NBISE00000000002;Parent=NBISM00000000001;makerName=nbis-exon-2
chr01   maker   exon    27851   27929   .       -       .       ID=NBISE00000000003;Parent=NBISM00000000001;makerName=nbis-exon-3
chr01   maker   exon    28302   28423   .       -       .       ID=NBISE00000000004;Parent=NBISM00000000001;makerName=nbis-exon-4
chr01   maker   exon    30953   31012   .       -       .       ID=NBISE00000000005;Parent=NBISM00000000001;makerName=nbis-exon-5
chr01   maker   CDS     24795   24945   .       -       1       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-5
chr01   maker   CDS     26435   26604   .       -       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=cds-4
[doutree@plop] $ grep mRNA Daucus_carota.gene_chr_AGAT_chr01.gff | head
chr01   maker   mRNA    24795   31012   .       -       .       ID=NBISM00000000001;Parent=NBISG00000000001;Name=CSTLP1;makerName=DcarChr1G00000010.1;product=CMP-sialic acid transporter 1;uniprot_id=Q654D9
chr01   maker   mRNA    33922   37446   .       +       .       ID=NBISM00000000002;Parent=NBISG00000000002;makerName=DcarChr1G00000020.1;product=hypothetical protein
chr01   exonerate       mRNA    45536   50728   .       -       .       ID=NBISM00000000003;Parent=NBISG00000000003;Name=At5g41760;makerName=DcarChr1G00000030.1;product=CMP-sialic acid transporter 1;uniprot_id=Q8LGE9
chr01   exonerate       mRNA    90633   141688  .       -       .       ID=NBISM00000000004;Parent=NBISG00000000004;Name=GIP;makerName=DcarChr1G00000040.1;product=Copia protein;uniprot_id=P04146
chr01   maker   mRNA    145015  147063  .       +       .       ID=NBISM00000000005;Parent=NBISG00000000005;Name=NAKR2;makerName=DcarChr1G00000050.1;product=Protein SODIUM POTASSIUM ROOT DEFECTIVE 2;uniprot_id=Q58FZ0
chr01   exonerate       mRNA    164172  286235  .       +       .       ID=NBISM00000000006;Parent=NBISG00000000006;Name=GIP;makerName=DcarChr1G00000060.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    395432  509234  .       -       .       ID=NBISM00000000007;Parent=NBISG00000000007;Name=GIP;makerName=DcarChr1G00000070.1;product=Copia protein;uniprot_id=P04146
chr01   exonerate       mRNA    534035  534211  .       -       .       ID=NBISM00000000008;Parent=NBISG00000000008;makerName=DcarChr1G00000080.1;product=hypothetical protein
chr01   maker   mRNA    639615  642189  .       +       .       ID=NBISM00000000009;Parent=NBISG00000000009;makerName=DcarChr1G00000090.1;product=hypothetical protein
chr01   transdecoder    mRNA    655131  661114  .       +       .       ID=NBISM00000000010;Parent=NBISG00000000010;Name=GLYR1;makerName=DcarChr1G00000100.1;product=Glyoxylate/succinic semialdehyde reductase 1;uniprot_id=Q9LSV0

Here is the input GFF file with original features IDs:

[doutree@by3acs] input $ head Daucus_carota.gene_chr_AGAT_chr01.gff
##gff-version 3
chr01   maker   gene    24795   31012   .       -       .       ID=DcarChr1G00000010
chr01   maker   mRNA    24795   31012   .       -       .       ID=DcarChr1G00000010.1;Parent=DcarChr1G00000010
chr01   maker   exon    24795   24945   .       -       .       ID=nbis-exon-1;Parent=DcarChr1G00000010.1
chr01   maker   exon    26435   26604   .       -       .       ID=nbis-exon-2;Parent=DcarChr1G00000010.1
chr01   maker   exon    27851   27929   .       -       .       ID=nbis-exon-3;Parent=DcarChr1G00000010.1
chr01   maker   exon    28302   28423   .       -       .       ID=nbis-exon-4;Parent=DcarChr1G00000010.1
chr01   maker   exon    30953   31012   .       -       .       ID=nbis-exon-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     24795   24945   .       -       1       ID=cds-5;Parent=DcarChr1G00000010.1
chr01   maker   CDS     26435   26604   .       -       0       ID=cds-4;Parent=DcarChr1G00000010.1

I do not understand this behavior as I would have indeed expected that original features IDs would be kept - in the same way as when agat_sp_manage_functional_annotation.pl is run.

Thanks!

Juke34 commented 2 months ago

It is because the Functional annotation workflow you use from https://github.com/NBISweden/pipelines-nextflow use the --id parameter of the agat_sp_manage_functional_annotation.pl script. You have to clone, modify the pipeline (remove line 30 in (pipelines-nextflow/functional_annotation_modules.config) and run your local version.

EmilieSmeets22 commented 2 months ago

Thank you for the quick answer. Is that correct that there is no parameter in the Functional annotation workflow to not use the --id parameter?

Juke34 commented 2 months ago

There is no command line parameter for it but as I pointed above you can modify this parameter by removing line 30 of the pipelines-nextflow/functional_annotation_modules.config file.