NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

issue with merge_annotation_identifier in functional pipeline #100

Closed LucileSol closed 7 months ago

LucileSol commented 8 months ago

I ran the functional pipeline twice one with the --pcds option and one without and got the following problem :

params.yml :

subworkflow: 'functional_annotation'
genome: 'genome.fa'
gff_annotation: 'split/mrna.gff'
blast_db_fasta: '/projects/references/databases/uniprot/2022-12/uniprot_sprot.fasta'
outdir: 'results'
merge_annotation_identifier : 'Lleo10'

custom.config :

process {
    withName: 'INTERPROSCAN'{
        time = 10.d
        conda =null
        container =null
    }    

    withName: 'MERGE_FUNCTIONAL_ANNOTATION'{
        time = 10.d
    } 

}
env.PATH ='${PATH}:/projects/references/interproscan/interproscan-5.59-91.0'

command line :

 nextflow run /home/lucso605/git/NBIS/pipelines-nextflow -profile singularity,nbis -params-file params.yml -c custom.config

results :

ptg000318l      maker   gene    6498    8750    .       -       .       ID=NBISG00000044641;Name=psaA;makerName=maker-ptg000318l-exonerate_protein2genome-gene-0.2
ptg000318l      maker   mRNA    6498    8750    76.4918 -       .       ID=NBISM00000125702;Parent=NBISG00000044641;Dbxref=FunFam:G3DSA:1.20.1130.10:FF:000001,Gene3D:G3DSA:1.20.1130.10,Hamap:MF_00458,InterPro:IPR036408,InterPro:IPR001280,InterPro:IPR006243,InterPro:IPR020586,MetaCyc:PWY-101,MetaCyc:PWY-8270,PANTHER:PTHR30128,PIRSF:PIRSF002905,PRINTS:PR00257,Pfam:PF00223,ProSitePatterns:PS00419,SUPERFAMILY:SSF81558,TIGRFAM:TIGR01335;Name=psaA;Ontology_term=-,GO:0009579,GO:0015979,GO:0016020,GO:0046872;_AED=0.09;_QI=0|-1|0|1|-1|0|1|0|750;_eAED=0.09;makerName=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-8;product=Photosystem I P700 chlorophyll a apoprotein A1;uniprot_id=Q49KZ8
ptg000318l      maker   exon    6498    8750    .       -       .       ID=NBISE00000681540;Parent=NBISM00000125702;makerName=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-8:25
ptg000318l      maker   CDS     6498    8750    .       -       0       ID=NBISC00000125702;Parent=NBISM00000125702;makerName=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-8:cds

The problem is the ID is for instance NBISG00000044641 and not Lleo1000000044641

params3.yml :

subworkflow: 'functional_annotation'
genome: 'genome.fa'
gff_annotation: 'split/mrna.gff'
blast_db_fasta: '/projects/references/databases/uniprot/2022-12/uniprot_sprot.fasta'
outdir: 'results_pcds'
merge_annotation_identifier : 'Lleo10'

custom3.config :

process {
    withName: 'INTERPROSCAN'{
        time = 10.d
        conda =null
        container =null
    }    

    withName: 'MERGE_FUNCTIONAL_ANNOTATION'{
        time = 10.d
        ext.args='--pcds'
    } 

}
env.PATH ='${PATH}:/projects/references/interproscan/interproscan-5.59-91.0'

command line :

nextflow run /home/lucso605/git/NBIS/pipelines-nextflow -profile singularity,nbis -params-file params3.yml -c custom3.config

results :

ptg000318l      maker   gene    6498    8750    .       -       .       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2;Name=psaA
ptg000318l      maker   mRNA    6498    7349    91      -       .       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6;Parent=maker-ptg000318l-exonerate_protein2genome-gene-0.2;Dbxref=Gene3D:G3DSA:1.20.1130.10,InterPro:IPR036408,InterPro:IPR001280,InterPro:IPR020586,MetaCyc:PWY-101,MetaCyc:PWY-8270,PANTHER:PTHR30128,PRINTS:PR00257,Pfam:PF00223,ProSitePatterns:PS00419,SUPERFAMILY:SSF81558;Name=psaA;Ontology_term=-,GO:0009579,GO:0015979,GO:0016020;_AED=0.38;_QI=0|0|0|1|0|0|2|0|225;_eAED=0.38;product=Photosystem I P700 chlorophyll a apoprotein A1;uniprot_id=Q332X8
ptg000318l      maker   exon    6498    6611    .       -       .       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6:21;Parent=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6
ptg000318l      maker   exon    6786    7349    .       -       .       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6:22;Parent=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6
ptg000318l      maker   CDS     6498    6611    .       -       0       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6:cds;Parent=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6;Dbxref=Gene3D:G3DSA:1.20.1130.10,InterPro:IPR036408,InterPro:IPR001280,InterPro:IPR020586,MetaCyc:PWY-101,MetaCyc:PWY-8270,PANTHER:PTHR30128,PRINTS:PR00257,Pfam:PF00223,ProSitePatterns:PS00419,SUPERFAMILY:SSF81558;Name=psaA;Ontology_term=-,GO:0009579,GO:0015979,GO:0016020;product=Photosystem I P700 chlorophyll a apoprotein A1;uniprot_id=Q332X8
ptg000318l      maker   CDS     6786    7349    .       -       0       ID=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6:cds;Parent=maker-ptg000318l-exonerate_protein2genome-gene-0.2-mRNA-6;Dbxref=Gene3D:G3DSA:1.20.1130.10,InterPro:IPR036408,InterPro:IPR001280,InterPro:IPR020586,MetaCyc:PWY-101,MetaCyc:PWY-8270,PANTHER:PTHR30128,PRINTS:PR00257,Pfam:PF00223,ProSitePatterns:PS00419,SUPERFAMILY:SSF81558;Name=psaA;Ontology_term=-,GO:0009579,GO:0015979,GO:0016020;product=Photosystem I P700 chlorophyll a apoprotein A1;uniprot_id=Q332X8

The problem is there is the ID is the maker ID for instance : maker-ptg000318l-exonerate_protein2genome-gene-0.2and not a Lleo10 something

and I just checked and realised I have had this problem in june already... so I don't know when it started.

I have seen also that if the pipeline is run twice the ID have different number but I think it is more dues to agat_sp_manage_functional_annotation.pl maybe we should modify it.

Thank you!