Closed Juke34 closed 5 months ago
Hej @Juke34, thanks for the script. I tested it, but it is not exactly doing what we need.
The input looks like this:
ptg000002l AUGUSTUS mRNA 3255 4626 0.5 + . ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9
ptg000002l AUGUSTUS CDS 3255 3275 0.98 + 0 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=g1.t1.CDS1
and we want to have all arguments (but not makerName) copied to the CDS entries:
ptg000002l AUGUSTUS mRNA 3255 4626 0.5 + . ID=NBISM00000000001;Parent=NBISG00000000001;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;makerName=g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9
ptg000002l AUGUSTUS CDS 3255 3275 0.98 + 0 ID=NBISC00000000001;Parent=NBISM00000000001;makerName=g1.t1.CDS1;Dbxref=CDD:cd07067,Gene3D:G3DSA:3.40.50.1240,InterPro:IPR013078,InterPro:IPR029033,;Name=ARB_03491;Ontology_term=-;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
ID=NBISE00000000009;Parent=NBISM00000000001;makerName=g1.t1.exon9
However, the current version of the script ignores all comma-separated entities of DBxref, as well it appends g1.t1;
to the makerName argument:
ptg000002l AUGUSTUS CDS 3255 3275 0.98 + 0 ID=NBISC00000000001,NBISM00000000001;Parent=NBISM00000000001,NBISG00000000001;Dbxref=CDD:cd07067;Name=ARB_03491;Ontology_term=-;makerName=g1.t1.CDS1,g1.t1;product=Probable phosphoglycerate mutase ARB_03491;uniprot_id=D4B4V1
Due to our current tight time limitations, we will probably just add @LucileSol script to the GAAS repo.
Due to our current tight time limitations, we will probably just add @LucileSol script to the GAAS repo. No problem as you prefer.
This script is anyway useful to AGAT, so I will include it. I have fixed the bugs (It should now behaves as you wish ^^).
thanks Jacques. I tested the new version and I am getting the following error. Is my input file not following the AGAT standards?
Can't use string ("13557_t") as an ARRAY ref while "strict refs" in use at /projects/martin/prog/conda_envs/agat-1.2.0/lib/perl5/site_perl/AGAT/OmniscientTool.pm line 1272.
and here is the potential problem:
ptg001613l GeneMark.hmm3 gene 1413 2319 . - . ID=NBISG00000015636;gene_id=13557_g;makerName=13557_g;transcript_id=13557_t
ptg001613l GeneMark.hmm3 mRNA 1413 2319 . - . ID=NBISM00000017245;Parent=NBISG00000015636;gene_id=13557_g;makerName=13557_t;product=hypothetical protein;transcript_id=13557_t
ptg001613l GeneMark.hmm3 exon 1413 1541 . - . ID=NBISE00000088938;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22462;transcript_id=13557_t
ptg001613l GeneMark.hmm3 exon 1842 1989 . - . ID=NBISE00000088939;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22463;transcript_id=13557_t
ptg001613l GeneMark.hmm3 exon 2036 2319 . - . ID=NBISE00000088940;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=nbis-exon-22464;transcript_id=13557_t
ptg001613l GeneMark.hmm3 CDS 1413 1541 . - 0 ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Internal;count=3_3;gene_id=13557_g;makerName=cds-79105;transcript_id=13557_t
ptg001613l GeneMark.hmm3 CDS 1842 1989 . - 1 ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Internal;count=2_3;gene_id=13557_g;makerName=cds-79106;transcript_id=13557_t
ptg001613l GeneMark.hmm3 CDS 2036 2319 . - 0 ID=NBISC00000017245;Parent=NBISM00000017245;cds_type=Initial;count=1_3;gene_id=13557_g;makerName=cds-79107;transcript_id=13557_t
ptg001613l GeneMark.hmm3 intron 1542 1841 . - 0 ID=NBISI00000071698;Parent=NBISM00000017245;gene_id=13557_g;makerName=intron-65549;transcript_id=13557_t
ptg001613l GeneMark.hmm3 intron 1990 2035 . - 2 ID=NBISI00000071699;Parent=NBISM00000017245;gene_id=13557_g;makerName=intron-65550;transcript_id=13557_t
ptg001613l GeneMark.hmm3 start_codon 2317 2319 . - 0 ID=NBISST00000017212;Parent=NBISM00000017245;gene_id=13557_g;makerName=start_codon-13548;transcript_id=13557_t
Are you sure you are using the latest version? I had this problem in previous commit that I have fixed (line $feature->add_tag_value($tag,@{$value}); in OmniscientTools). I will give a try
Check done. Your example works fine on my side.
@LucileSol @MartinPippel @mahesh-panchal Is that script fine to you? Could you give a try?
example of usage: