genomeannotation / GAG

Generates an NCBI .tbl file of annotations on a genome.
MIT License
64 stars 20 forks source link

Add ability to write RNA features and pseudogenes #163

Open smg283 opened 8 years ago

smg283 commented 8 years ago

Pseudogenes rRNA tRNA miRNA ncRNA

see tbl format spec here: http://www.ncbi.nlm.nih.gov/genbank/genomesubmit_annotation see GFF3 format spec here: http://www.sequenceontology.org/resources/gff3.html code RNA types as gene -> [RNA type] -> exon in gff3 (meaning will have parent gene and then also have child exon); pseudogene -> pseudogenic_transcript -> pseudogenic_exon; and/or pseudogene -> transcript -> exon

tedsta commented 8 years ago

It looks like all of these were already added except for miRNA and pseudogenes. GAG currently treats all the RNAs the same, though.

tedsta commented 8 years ago

@smg283 what about snRNA?

smg283 commented 7 years ago

low-ish priority, can kick back off 2.0

fbremer commented 7 years ago

Notes:

supported XRNA types (at parsing) include 'mRNA', 'tRNA', 'rRNA', 'ncRNA', 'miRNA', and 'snRNA'. XRNA object records type in a string at self.rna_type

supported Gene types (at parsing) include 'gene' and 'pseudogene'. Gene object records type in a bool at self.pseudo

vivekkrish commented 7 years ago

Hi!

Thanks for creating this very useful tool!

Just wanted to share some work that we've done to expand on the number of supported XRNA types. You can see these changes in our fork of GAG (please see: https://github.com/genomeannotation/GAG/compare/dev...Arabidopsis-Information-Portal:dev).

We also updated the annotations file, which can now be defined as a 4-column tsv, where the last column specifies the NCBI feature type to which the annotation should be attached. For example, below you will see tags to be attached to the gene features, some to mRNA and CDS types:

AT1G02145   db_xref TAIR:AT1G02145  gene
AT1G02145   gene    ALG12   gene
AT1G02145   gene_syn    EBS4    gene
AT1G02145   gene_syn    EMS-MUTAGENIZED BRI1(BRASSINOSTEROID INSENSITIVE 1) SUPPRESSOR 4    gene
AT1G02145   gene_syn    homolog of asparagine-linked glycosylation 12   gene
AT1G02145.1 db_xref TAIR:AT1G02145  CDS
AT1G02145.1 db_xref TAIR:AT1G02145  mRNA
AT1G02145.1 inference   Similar to RNA sequence, EST:INSD:EG518891.1,INSD:BP577734.1,INSD:BP581601.1, INSD:EG518892.1,INSD:BP575147.1,INSD:EL046732.1, INSD:EG492351.1,INSD:EL075755.1,INSD:EG501755.1, INSD:EG518889.1,INSD:AU226395.1,INSD:BP577546.1, INSD:EG518890.1    mRNA
AT1G02145.1 inference   Similar to RNA sequence, EST:INSD:EG518891.1,INSD:EG464605.1,INSD:BP577734.1, INSD:EG492362.1,INSD:BP581601.1,INSD:EG492385.1, INSD:EG518892.1,INSD:EG445265.1,INSD:EG492396.1, INSD:BP575147.1,INSD:EL046732.1,INSD:EG464599.1, INSD:EL075755.1,INSD:EG492351.1,INSD:EG492329.1, INSD:EG518889.1,INSD:BP577546.1,INSD:AU226395.1, INSD:EG492307.1,INSD:EG518890.1  CDS
AT1G02145.1 inference   similar to RNA sequence, mRNA:INSD:EF183364.1,INSD:DQ492199.1   CDS
AT1G02145.1 note    homolog of asparagine-linked glycosylation 12 (ALG12); FUNCTIONS IN: alpha-1,6-mannosyltransferase activity; INVOLVED IN: ER-associated protein catabolic process, protein amino acid terminal N-glycosylation; LOCATED IN: endomembrane system, intrinsic to endoplasmic reticulum membrane; CONTAINS InterPro DOMAIN/s: Alg9-like mannosyltransferase (InterPro:IPR005599). CDS
AT1G02145.1 product homolog of asparagine-linked glycosylation 12   CDS
AT1G02145.1 product homolog of asparagine-linked glycosylation 12   mRNA
AT1G02145.1 protein_id  AEE27389.1  CDS
AT1G02145.2 db_xref TAIR:AT1G02145  CDS
AT1G02145.2 db_xref TAIR:AT1G02145  mRNA
AT1G02145.2 inference   Similar to RNA sequence, EST:INSD:EG518891.1,INSD:BP577734.1,INSD:BP581601.1, INSD:EG518892.1,INSD:BP575147.1,INSD:EL046732.1, INSD:EG492351.1,INSD:EL075755.1,INSD:EG501755.1, INSD:EG518889.1,INSD:AU226395.1,INSD:BP577546.1, INSD:EG518890.1    CDS
AT1G02145.2 inference   Similar to RNA sequence, EST:INSD:EG518891.1,INSD:EG464605.1,INSD:BP577734.1, INSD:EG492362.1,INSD:BP581601.1,INSD:EG492385.1, INSD:EG518892.1,INSD:EG445265.1,INSD:EG492396.1, INSD:BP575147.1,INSD:EL046732.1,INSD:EG464599.1, INSD:EL075755.1,INSD:EG492351.1,INSD:EG492329.1, INSD:EG518889.1,INSD:BP577546.1,INSD:AU226395.1, INSD:EG492307.1,INSD:EG518890.1  mRNA
AT1G02145.2 inference   similar to RNA sequence, mRNA:INSD:EF183364.1,INSD:DQ492199.1   mRNA
AT1G02145.2 note    homolog of asparagine-linked glycosylation 12 (ALG12); FUNCTIONS IN: alpha-1,6-mannosyltransferase activity; INVOLVED IN: ER-associated protein catabolic process, protein amino acid terminal N-glycosylation; LOCATED IN: intrinsic to endoplasmic reticulum membrane; CONTAINS InterPro DOMAIN/s: Alg9-like mannosyltransferase (InterPro:IPR005599).  CDS
AT1G02145.2 product homolog of asparagine-linked glycosylation 12   CDS
AT1G02145.2 product homolog of asparagine-linked glycosylation 12   mRNA
AT1G02145.2 protein_id  AEE27390.1  CDS

The above change does break some of the tests (which we haven't updated).

Please do review these changes and let us know if wish to integrate this into your repo. We would be happy to issue a pull request (possibly into a separate branch) for your review.

Thanks again!

fbremer commented 7 years ago

looks good, I'll remove this from the current milestone and look into doing a pull request once this release is out.

vivekkrish commented 7 years ago

Thank you!

In the meanwhile, we are working to update our code so that it satisfies your existing set of tests. Once ready, I will submit a preliminary pull request for your review.