SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
32 stars 16 forks source link

Many GenBank feature types do not have SBOL Visual Glyphs #159

Open cjmyers opened 3 years ago

cjmyers commented 3 years ago

https://docs.google.com/spreadsheets/d/1X870i3NhO7xEhqhLXK4eravNd72x-O-xbrpmlT835nY/edit?usp=sharing

jakebeal commented 3 years ago

Note that these are not so much official as a community-sources list of things that people often find in GenBank. Coping in from the spreadsheet:

GenBank SO Term SBOL Visual    
allele SO:0001023      
attenuator SO:0000140      
C_region SO:0001834      
CAAT_signal SO:0000172      
CDS SO:0000316 cds    
D-loop SO:0000297 origin-of-replication    
D_segment SO:0000458      
enhancer SO:0000165      
exon SO:0000147    can be shown as CDS minus intron  
gene SO:0000704      
GC_signal SO:0000173      
iDNA SO:0000723      
intron SO:0000188  intron    
J_region SO:0000470      
LTR SO:0000286      
mat_peptide SO:0000419      
misc_binding SO:0000409 operator    
misc_difference SO:0000413      
misc_feature SO:0000001      
misc_marker SO:0001645      
misc_recomb SO:0000298      
misc_RNA SO:0000233      
misc_signal SO:0001411      
misc_structure SO:0000002      
modified_base SO:0000305      
mRNA SO:0000234      
N_region SO:0001835      
polyA_signal SO:0000551      
polyA_site SO:0000553 poly-a    
precursor_RNA SO:0000185      
prim_transcript SO:0000185      
primer SO:0000112      
primer_bind SO:0005850 primer-binding-site    
promoter SO:0000167 promoter    
protein_bind SO:0000410 operator    
RBS SO:0000139 rbs    
rep_origin SO:0000296 origin-of-replication    
repeat_region SO:0000657      
repeat_unit SO:0000726      
rRNA SO:0000252      
S_region SO:0001836      
satellite SO:0000005      
scRNA SO:0000013      
sig_peptide SO:0000418      
snRNA SO:0000274      
source SO:0000149      
stem_loop SO:0000313      
STS SO:0000331      
TATA_signal SO:0000174      
terminator SO:0000141 terminator    
transit_peptide SO:0000725      
transposon SO:0001054      
tRNA SO:0000253      
V_region SO:0001833      
variation SO:0001060      
-10_signal SO:0000175      
-35_signal SO:0000176      
3'clip SO:0000557      
3'UTR SO:0000205      
5'clip SO:0000555      
5'UTR SO:0000204      
regulatory SO:0005836      
snoRNA SO:0000275      
none of above SO:0000110 unspecified    
         
         
GenBank SO Term   Synonyms: SO Term
assembly_gap NA   gap SO:0000730
centromere SO:0000577      
gap SO:0000730      
J_segment     J_gene_segment SO:0000470
mobile_element NA   mobile_genetic_element SO:0001037
ncRNA SO:0000655      
old_sequence NA      
operon SO:0000178      
oriT SO:0000724      
propeptide SO:0001062      
telomere SO:0000624      
tmRNA SO:0000584      
unsure     sequence_uncertainty SO:0001086
V_segment     V_gene_segment SO:0000466

(edited to add missing information)

jakebeal commented 3 years ago

Looking at these, I'm not sure how many we actually really need to fill in, until such time as somebody wants them. How often does a synbio system actually work with rRNA, for example? A better solution might be to have an alternative to the "no glyph assigned" bracket that looks nicer on diagrams.

shyambhakta commented 3 years ago

I think several of these are represented at some level, for which the question is whether or not to make a specialty glyph for it or formalize repurposing an existing one for the SO terms.

attenuator - terminator (conditionally repressed based on translation rate of upstream leader peptide; not ever used in syn bio) oriT - origin of transfer glyph. polyA_signal - no different than polyA site? transit peptide / signal peptide - protein location? This superimposition of protein stem glyphs with CDS/domain glyphs raises an old, unresolved question from issue 78. J_segment and V_segment are exons specific to the V(D)J recombination locus, and we have exon glyphs.

These are special kinds of protein-binding sites, the latter 4 inside promoters which I think can just be labeled inside the protein-binding site glyph, especially the last three.

These RNA terms can use an ncRNA gene or squiggly RNA backbone, depending on whether the DNA gene or RNA itself is being referred to. We don't have different CDS glyphs for different types of proteins. No need to have different ncRNA glyphs for different ncRNAs, like rRNAs or tRNAs, I think.

And these terms bring up unresolved issue #113 :

cjmyers commented 3 years ago

Good point. We likely should have more alternative SO terms for some of our glyphs based upon this.