Open smg283 opened 8 years ago
It looks like all of these were already added except for miRNA and pseudogenes. GAG currently treats all the RNAs the same, though.
@smg283 what about snRNA?
low-ish priority, can kick back off 2.0
supported XRNA
types (at parsing) include 'mRNA', 'tRNA', 'rRNA', 'ncRNA', 'miRNA', and 'snRNA'.
XRNA
object records type in a string at self.rna_type
supported Gene
types (at parsing) include 'gene' and 'pseudogene'.
Gene
object records type in a bool at self.pseudo
Hi!
Thanks for creating this very useful tool!
Just wanted to share some work that we've done to expand on the number of supported XRNA types. You can see these changes in our fork of GAG (please see: https://github.com/genomeannotation/GAG/compare/dev...Arabidopsis-Information-Portal:dev).
We also updated the annotations file, which can now be defined as a 4-column tsv, where the last column specifies the NCBI feature type to which the annotation should be attached. For example, below you will see tags to be attached to the gene
features, some to mRNA
and CDS
types:
AT1G02145 db_xref TAIR:AT1G02145 gene
AT1G02145 gene ALG12 gene
AT1G02145 gene_syn EBS4 gene
AT1G02145 gene_syn EMS-MUTAGENIZED BRI1(BRASSINOSTEROID INSENSITIVE 1) SUPPRESSOR 4 gene
AT1G02145 gene_syn homolog of asparagine-linked glycosylation 12 gene
AT1G02145.1 db_xref TAIR:AT1G02145 CDS
AT1G02145.1 db_xref TAIR:AT1G02145 mRNA
AT1G02145.1 inference Similar to RNA sequence, EST:INSD:EG518891.1,INSD:BP577734.1,INSD:BP581601.1, INSD:EG518892.1,INSD:BP575147.1,INSD:EL046732.1, INSD:EG492351.1,INSD:EL075755.1,INSD:EG501755.1, INSD:EG518889.1,INSD:AU226395.1,INSD:BP577546.1, INSD:EG518890.1 mRNA
AT1G02145.1 inference Similar to RNA sequence, EST:INSD:EG518891.1,INSD:EG464605.1,INSD:BP577734.1, INSD:EG492362.1,INSD:BP581601.1,INSD:EG492385.1, INSD:EG518892.1,INSD:EG445265.1,INSD:EG492396.1, INSD:BP575147.1,INSD:EL046732.1,INSD:EG464599.1, INSD:EL075755.1,INSD:EG492351.1,INSD:EG492329.1, INSD:EG518889.1,INSD:BP577546.1,INSD:AU226395.1, INSD:EG492307.1,INSD:EG518890.1 CDS
AT1G02145.1 inference similar to RNA sequence, mRNA:INSD:EF183364.1,INSD:DQ492199.1 CDS
AT1G02145.1 note homolog of asparagine-linked glycosylation 12 (ALG12); FUNCTIONS IN: alpha-1,6-mannosyltransferase activity; INVOLVED IN: ER-associated protein catabolic process, protein amino acid terminal N-glycosylation; LOCATED IN: endomembrane system, intrinsic to endoplasmic reticulum membrane; CONTAINS InterPro DOMAIN/s: Alg9-like mannosyltransferase (InterPro:IPR005599). CDS
AT1G02145.1 product homolog of asparagine-linked glycosylation 12 CDS
AT1G02145.1 product homolog of asparagine-linked glycosylation 12 mRNA
AT1G02145.1 protein_id AEE27389.1 CDS
AT1G02145.2 db_xref TAIR:AT1G02145 CDS
AT1G02145.2 db_xref TAIR:AT1G02145 mRNA
AT1G02145.2 inference Similar to RNA sequence, EST:INSD:EG518891.1,INSD:BP577734.1,INSD:BP581601.1, INSD:EG518892.1,INSD:BP575147.1,INSD:EL046732.1, INSD:EG492351.1,INSD:EL075755.1,INSD:EG501755.1, INSD:EG518889.1,INSD:AU226395.1,INSD:BP577546.1, INSD:EG518890.1 CDS
AT1G02145.2 inference Similar to RNA sequence, EST:INSD:EG518891.1,INSD:EG464605.1,INSD:BP577734.1, INSD:EG492362.1,INSD:BP581601.1,INSD:EG492385.1, INSD:EG518892.1,INSD:EG445265.1,INSD:EG492396.1, INSD:BP575147.1,INSD:EL046732.1,INSD:EG464599.1, INSD:EL075755.1,INSD:EG492351.1,INSD:EG492329.1, INSD:EG518889.1,INSD:BP577546.1,INSD:AU226395.1, INSD:EG492307.1,INSD:EG518890.1 mRNA
AT1G02145.2 inference similar to RNA sequence, mRNA:INSD:EF183364.1,INSD:DQ492199.1 mRNA
AT1G02145.2 note homolog of asparagine-linked glycosylation 12 (ALG12); FUNCTIONS IN: alpha-1,6-mannosyltransferase activity; INVOLVED IN: ER-associated protein catabolic process, protein amino acid terminal N-glycosylation; LOCATED IN: intrinsic to endoplasmic reticulum membrane; CONTAINS InterPro DOMAIN/s: Alg9-like mannosyltransferase (InterPro:IPR005599). CDS
AT1G02145.2 product homolog of asparagine-linked glycosylation 12 CDS
AT1G02145.2 product homolog of asparagine-linked glycosylation 12 mRNA
AT1G02145.2 protein_id AEE27390.1 CDS
The above change does break some of the tests (which we haven't updated).
Please do review these changes and let us know if wish to integrate this into your repo. We would be happy to issue a pull request (possibly into a separate branch) for your review.
Thanks again!
looks good, I'll remove this from the current milestone and look into doing a pull request once this release is out.
Thank you!
In the meanwhile, we are working to update our code so that it satisfies your existing set of tests. Once ready, I will submit a preliminary pull request for your review.
Pseudogenes rRNA tRNA miRNA ncRNA
see tbl format spec here: http://www.ncbi.nlm.nih.gov/genbank/genomesubmit_annotation see GFF3 format spec here: http://www.sequenceontology.org/resources/gff3.html code RNA types as gene -> [RNA type] -> exon in gff3 (meaning will have parent gene and then also have child exon); pseudogene -> pseudogenic_transcript -> pseudogenic_exon; and/or pseudogene -> transcript -> exon