NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
432 stars 52 forks source link

New script to move feature from features of a same record (agat_sp_move_attributes_within_records) #413

Closed Juke34 closed 5 months ago

Juke34 commented 5 months ago

@LucileSol @MartinPippel @mahesh-panchal Is that script fine to you? Could you give a try?

example of usage:

agat_sp_move_attributes_within_records.pl --gff infile.gff --feature_copy mRNA  --feature_paste CDS,exon --attribute Dbxref,Ontology
MartinPippel commented 5 months ago

Hej @Juke34, thanks for the script. I tested it, but it is not exactly doing what we need.

The input looks like this:

ptg000002l      AUGUSTUS        mRNA    3255    4626    0.5     +       .       ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9
ptg000002l      AUGUSTUS        CDS     3255    3275    0.98    +       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=g1.t1.CDS1

and we want to have all arguments (but not makerName) copied to the CDS entries:

ptg000002l      AUGUSTUS        mRNA    3255    4626    0.5     +       .       ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9
ptg000002l      AUGUSTUS        CDS     3255    3275    0.98    +       0       ID=NBISC00000000001;Parent=NBISM00000000001;makerName=g1.t1.CDS1;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9

However, the current version of the script ignores all comma-separated entities of DBxref, as well it appends g1.t1; to the makerName argument:

ptg000002l      AUGUSTUS        CDS     3255    3275    0.98    +       0       ID=NBISC00000000001,NBISM00000000001;Parent=NBISM00000000001,NBISG00000000001;Dbxref=CDD:cd07067;Name=ARB_03491;Ontology_term=-;makerName=g1.t1.CDS1,g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1

Due to our current tight time limitations, we will probably just add @LucileSol script to the GAAS repo.

Juke34 commented 5 months ago

Due to our current tight time limitations, we will probably just add @LucileSol script to the GAAS repo. No problem as you prefer.

This script is anyway useful to AGAT, so I will include it. I have fixed the bugs (It should now behaves as you wish ^^).

MartinPippel commented 5 months ago

thanks Jacques. I tested the new version and I am getting the following error. Is my input file not following the AGAT standards?

Can't use string ("13557_t") as an ARRAY ref while "strict refs" in use at /projects/martin/prog/conda_envs/agat-1.2.0/lib/perl5/site_perl/AGAT/OmniscientTool.pm line 1272.

and here is the potential problem:

ptg001613l      GeneMark.hmm3   gene    1413    2319    .       -       .       ID=NBISG00000015636;gene_id=13557_g;makerName=13557_g;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   mRNA    1413    2319    .       -       .       ID=NBISM00000017245;Parent=NBISG00000015636;gene_id=13557_g;makerName=13557_t;product=hypothetical protein;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   exon    1413    1541    .       -       .       ID=NBISE00000088938;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22462;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   exon    1842    1989    .       -       .       ID=NBISE00000088939;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22463;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   exon    2036    2319    .       -       .       ID=NBISE00000088940;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22464;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   CDS     1413    1541    .       -       0       ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=cds-79105;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   CDS     1842    1989    .       -       1       ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Internal;count=2_3;gene_id=13557_g;makerName=cds-79106;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   CDS     2036    2319    .       -       0       ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Initial;count=1_3;gene_id=13557_g;makerName=cds-79107;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   intron  1542    1841    .       -       0       ID=NBISI00000071698;Parent=NBISM00000017245;gene_id=13557_g;makerName=intron-65549;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   intron  1990    2035    .       -       2       ID=NBISI00000071699;Parent=NBISM00000017245;gene_id=13557_g;makerName=intron-65550;transcript_id=13557_t
ptg001613l      GeneMark.hmm3   start_codon     2317    2319    .       -       0       ID=NBISST00000017212;Parent=NBISM00000017245;gene_id=13557_g;makerName=start_codon-13548;transcript_id=13557_t
Juke34 commented 5 months ago

Are you sure you are using the latest version? I had this problem in previous commit that I have fixed (line $feature->add_tag_value($tag,@{$value}); in OmniscientTools). I will give a try

Juke34 commented 5 months ago

Check done. Your example works fine on my side.