SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
33 stars 16 forks source link

Make macromolecule different from yeast cells #60

Closed jakebeal closed 5 years ago

jakebeal commented 5 years ago

Our current recommended macromolecule glyph looks too much like the glyph people often use for yeast cells (it has also been criticized as not being easy enough to draw).

A patch solution is to simply use the SBGN variant instead (rounded box), but there were good reasons we didn't make that the recommended glyph in the first place.

Can we make a variant that is different, to avoid this visual collision?

Potential suggestions that have been made previously include:

image image image

hsauro commented 5 years ago

Is the arbitrary monomer symbol the right most draw symbol?

Herbert

On Thu, Mar 28, 2019 at 4:02 AM Jacob Beal notifications@github.com wrote:

Our current recommended macromolecule glyph looks too much like the glyph people often use for yeast cells (it has also been criticized as not being easy enough to draw).

A patch solution is to simply use the SBGN variant instead (rounded box), but there were good reasons we didn't make that the recommended glyph in the first place.

Can we make a variant that is different, to avoid this visual collision?

Potential suggestions that have been made previously include:

  • a kidney/bean/crescent shape ("peanut") has been rejected in SEP V008
  • a biconvave/dumb-bell shape (this could be confused with a red blood cell, but I think this is unlikely?) ("kidney")
  • a variant of the dumb-bell with a bulge in the middle ("tie fighter")
  • three-way symmetric blobby object ("fidget spinner")
  • "Any arbitrary monomer symbol linked by a backbone line"
  • "a loosely wound string of beads"
  • "an arbitrary a self-intersecting curve"

[image: image] https://user-images.githubusercontent.com/10675899/55152485-81334380-511e-11e9-9b43-50cfe6c53244.png [image: image] https://user-images.githubusercontent.com/10675899/55152570-aaec6a80-511e-11e9-9eb5-f85c3b0e88fc.png [image: image] https://user-images.githubusercontent.com/10675899/55152679-e129ea00-511e-11e9-9791-747fb76d0529.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/ABAZDvYisqx3tiecWOQIkTA1n99kzUM3ks5vbKEegaJpZM4cP4Yr .

-- Herbert Sauro, Associate Professor University of Washington, Bioengineering 206-685-2119, www.sys-bio.org hsauro@uw.edu Books: http://books.analogmachine.org/

oerbilgin commented 5 years ago

Could you please provide context as to why the kidney bean shape was rejected? I read through SEP V008 and couldn't find a discussion about that shape. It's definitely my personal favorite to describe a protein/polypeptide chain. Either that or pac-man, though that's most often used to specifically describe the enzyme subset of proteins.

Also, could I please recommend changing the semantics of this shape to represent a polypeptide chain, rather than macromolecule? Macromolecule is a very broad term that is traditionally used to represent all sorts of large molecules including DNA, RNA, protein, lipids, and polysaccharides (e.g. glycogen). It seems like its being used here to only describe polypeptides.

bbartley commented 5 years ago

Hi Onur, I think your observations make a very important point. Perhaps we don't need to spin our wheels so much on the symbol for macromolecule.  In practice, no one ever draws an abstract macromolecule in their diagrams.  They generally have a very explicit type of macromolecule in mind, usually DNA, RNA, protein.  And then, as you point, out, we are usually interested in subtypes of these macromolecules... is this protein an enzyme, transcription factor, receptor, or reporter.  Perhaps it would be better to discuss symbols for these heavily used subtypes instead.  It might not matter which symbol we use for macromolecule, since practically no one will be drawing diagrams at such a high level of abstraction. BestBryan On Thursday, March 28, 2019, 12:54:26 PM EDT, Onur Erbilgin notifications@github.com wrote:

Could you please provide context as to why the kidney bean shape was rejected? I read through SEP V008 and couldn't find a discussion about that shape. It's definitely my personal favorite to describe a protein/polypeptide chain. Either that or pac-man, though that's most often used to specifically describe the enzyme subset of proteins.

Also, could I please recommend changing the semantics of this shape to represent a polypeptide chain, rather than macromolecule? Macromolecule is a very broad term that is traditionally used to represent all sorts of large molecules including DNA, RNA, protein, lipids, and polysaccharides (e.g. glycogen). It seems like its being used here to only describe polypeptides.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jakebeal commented 5 years ago

@oerbilgin I see my "kidney" and "peanut" lines got mangled together, and have corrected them. The "peanut" shape was explicitly rejected due to its symmetry and potential for confusion with a complex of generic species. The "kidney" shape never made it into consideration for no good reason --- it simply lacked an advocate when people were kicking ideas around on a whiteboard early on.

More significantly, I think that both you and @bbartley bring up an important issue about terminology. Currently the definitions are linked to BioPAX terms because that's what the SBOL Standard uses for its ComponentDefinitions. The name for the glyph, however, is taken from SBGN. We thus have a mismatch where "Macromolecule" glyph is grounded in "BioPAX#Protein".

The distinction that you're looking for cannot be supported by BioPAX, which has a rather impoverished representation.

Let us proceed for now on the assumption that what we're talking about is a protein, then, and not any other form of macromolecule. Separately, I'll open up issues to consider other categorizations rather than BioPAX: SBO would make a lot of sense, since we're using it for interactions anyway.

jakebeal commented 5 years ago

As this has been kicking around in my head today, here's a few more options to consider (from left to right):

image

bbartley commented 5 years ago

Of these 3, I like the middle one the best. It is aesthetically pleasing and easy to draw. More importantly, I think it symbolically captures the essence of a protein having both a primary structure in addition to higher order structure. It's sort of like having a glyph of the unfolded protein (squiggle) superimposed on the folded protein (circle). Beautiful.

jakebeal commented 5 years ago

I'd like to suggest that we may have narrowed the candidates to three:

Can we hear some more discussion on pluses and minuses for these, or other / new ones people would like to propose as good candidates?

jamesscottbrown commented 5 years ago

Of this shortlist, the 'squiggle-circle' (at least as drawn in the example above) is my least favourite, because I think it looks too much like an elaboration of the empty-set/degradation symbol.

I like the trefoil, but I think it's harder to draw than any of the existing glyphs: it's a continuous curve that overlaps itself several times, so it takes a moment to work out how to draw it for the first time.

cjmyers commented 5 years ago

I tend to concur with James on this one. However, I believe these are all quite different than anything you will find in existing publications.

bbartley commented 5 years ago

Trefoil is too similar to an antibody.

jakebeal commented 5 years ago

@cjmyers Is "different" good or bad in your opinion in this case?

mikebissell commented 5 years ago

macromolecules

cjmyers commented 5 years ago

Both in a way. If it is different, then less existing diagrams will be using it to begin with which is bad, but different also means it is less likely to have another meaning.

Personally, what I'm hearing is that most alternatives are or look like things used in different ways in different contexts. While I agree it is unfortunate that the current one looks like yeast, I also cannot imaging that it would be confused for yeast in context. Finding a solution that looks good and has never been used for another thing is likely impossible. That being said I'm keeping an open mind in case a good idea comes forward. I don't think we have one yet.

ReneeLizena commented 5 years ago

Hi all,

I thought I would suggest some ideas, although before I do I like the "dumbbell" shape, I think that describes a sum of parts very well, which I think is a key identifier of macromolecules. Also, in the second image the trefoil is my favourite.

In my image, the first symbol represents "the sum of parts", the cross being a symbol for addition and the circle being a "part" of sorts, although in the stylised version, this may be difficult to draw?

Symbol 2 shows a normal small molecule with jigsaw-like detail, which represent the ability for smaller molecules to join together to make a macromolecule, although I am aware that perhaps the meaning of this symbol is not as obvious.

Symbol 3 shows a link between two small molecules - I made them into circles to make the symbol more compact.

Symbol 4 depicts a chain, with each part representing a "small molecule".

Finally, although the current symbol looks like yeast, perhaps adding a few more circles changes that fact (Symbol 5)?

IMG_20190401_075138

ReneeLizena commented 5 years ago

Another suggestion! (Last one). Two circles (molecules) joined together in a chain of sorts.

IMG_20190401_125827

jakebeal commented 5 years ago

Thank you for all of these suggestions! Comments from others on their opinions?

cjmyers commented 5 years ago

Dumbbell shape I’ve seen used for this before, so I also like that one.

On Apr 1, 2019, at 6:33 AM, Jacob Beal notifications@github.com wrote:

Thank you for all of these suggestions! Comments from others on their opinions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60#issuecomment-478560511, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD9-C9nQlWMZlZPW2ptrqgkUN4BS9hks5vcfysgaJpZM4cP4Yr.

JS3xton commented 5 years ago

Another idea: a large circle overlaid with off-center medium and small circles.


Easy to draw. Not sure it gets far enough away from budding yeast, though.

jakebeal commented 5 years ago

@JS3xton Visually, I like it, but it does have the problem of being equivalent to a 3-part compound molecule.

jakebeal commented 5 years ago

Reviewing the discussion and proposals, I find that we've actually got a surprising degree of consensus:

I propose that we focus on the closed curves, as the other two can be confused with composites or interactions. I've also just drawn all of the closed curve figure by hand, and of them I found four that were significantly easier to draw than all of the others:

What do people think of these options for a new "protein" glyph? Personally, I think all of them and plausible, but like the simplicity and asymmetry of "kidney" best.

shyambhakta commented 5 years ago

For simply "protein", I vote the kidney. The others seem too much like common depictions of domain multimers / "macromolecular complexes".

rsc3 commented 5 years ago

I vote for "Circle double-bead"

rsc3 commented 5 years ago

But I really prefer Mike Bissel's glyphs.

jakebeal commented 5 years ago

@rsc3 Which is the one you are calling "circle double-bead" and why do you prefer it?

graik commented 5 years ago

There was some additional discussion on sboldev and sbolvisual mailing lists. @udp and Robert S. Cox and me argued against coming up with a contrived arbitrary shape. As James put it:

Proteins aren't some highly specific feature where we need a specific glyph to make them unambiguous. They are one of the fundamental building blocks of biology, and the currency of genetic circuits. Existing publications generally use either no glyph (just the name of the protein), a circle, or an ellipse precisely for this reason - we have to draw them A LOT.

Robert and me made again the case for the stadium ("pill") shape which is extensively used for the representation of multi-domain protein architectures. Unfortunately, this shape is "blocked" by its use for "small molecule" in SBGN, which (I argued) may make sense in the context of metabolic diagrams but not in the context of mixed biological circuits. The advantage of this shape is that it can be easily enriched with sub-features (sub-domains, binding sites, etc) and can be scaled to reflect size and relative arrangement of protein regions, which, IMO, is as important for proteins as it is for genes.

Jake surveyed 32 recent ACS Synbio articles and reported this breakdown of symbols used in practice (in this particular community):

  • 7 ellipses (compatible with SBOL Visual already)
    • 4 domain diagrams (similar to protein diagram language)
    • 3 asymmetric polygons w. round edges
    • 2 rectangles
    • 2 wiggly-cloud-blobs
    • 2 pac-man
    • 1 rounded-rectangle, 1 bead-chain, 1 stadium

Ellipse is a clear front-runner, followed by variations of rounded "rectangularish" shapes. Unfortunately, in SBGN, Ellipse currently is defined as the generic symbol for "unspecified entity". Furthermore, an isolated ellipse is not amenable to represent multi-domain proteins or sub-features. I think we have roughly three options:

(1) take SBGN rounded box and just stretch it a bit and make it more rounded to distinguish it from CDS or other DNA features (as close to pill/stadium shape as we can without getting there) (2) appropriate stadium/pill shape for SBOL and accept that some basic symbols are context-dependent (bioengineering networks vs. metabolic networks) (3) take ellipse but modify it slightly so that it is not any longer identical to SBGN "unspecified entity"

My suggestion for option 3 would be: protein_suggestion_RG_v2 (Top: generic protein, bottom: protein with multi-domain detail shown)

jakebeal commented 5 years ago

[copying here from sbol-visual group thread]

@graik I really am concerned about changing the meaning of stadium, as it would then not be backward compatible. I like the way you're going with alternatives, though specifically using a line is problematic, because it invites confusion with interaction edges.

What if we did something like fattening the lines into half-stadiums / half-ellipses? image

graik commented 5 years ago

I am afraid the UFO shape again conjures other associations (a eukaryotic cell when viewed from the side? a balloon?) I just browsed through the SBOLv 2.1 spec and cannot see how "ellipse(s) on line" could be easily confused with interaction. Interactions always have arrows or endings of sort, have they not? And the DNA design is arranged on a line, too, without causing confusion.

Perhaps we can recommend that the protein line should be preferably double the thickness of any DNA line in the same diagram. This would approach your shape but would still be very easy to draw in powerpoint etc. Like this: protein_suggestion_RG_v2

I think I remember though that you didn't want to specify line thickness. Perhaps we can make an exception here?

One possible consequence in the wild could be that people look at this and then (a) simply leave away the ellipse (e.g. when talking about completely unstructured proteins) or (b) leave away the line when talking about a completely structured protein. I am not sure that would be such a terrible thing though.

jakebeal commented 5 years ago

More discussion on the sbol-visual Google group has led to the following proposal (as I understand it):

  1. We change "small molecule" from "stadium" to "circle" (noting that SBGN's reference cards show only the circle anyway).
  2. We make protein be "stadium"
  3. We deprecate (and de-recommend) the "yeast-like" glyph for macromolecule, but do not delete it.

This would also necessitate adding commentary to the small-molecule, protein, and generic glyphs to the effect that they cannot be stretched without being able to be confused. Given this, I have further proposed to use a small polygon (e.g., hexagon) for small molecule instead of a circle.

shyambhakta commented 5 years ago

I like the idea of a hexagon for small molecules. Someone is bound to say that hexagons are suggestive of sugars, as the most common are pyranoses (containing six-membered rings), and are thus often symbolized as hexagons to specifically allude to the pyranose structure. I could still get behind the hexagon, though, especially because it's elongatable which is useful. The polygon is also a stronger distinction from the protein stadium than a circle.

graik commented 5 years ago

I agree. Hexagon = small molecule sounds pretty nice. But if we stick with circle that should be fine, too. In particular, we could recommend to have the name of the compound as a label outside of the circle so that the circle shape itself is kept small.

Greetings Raik

On Tue, May 28, 2019 at 1:59 PM Shyam Bhakta notifications@github.com wrote:

I like the idea of a hexagon for small molecules. Someone is bound to say that hexagons are suggestive of sugars, as the most common are pyranoses (containing six-membered rings), and are thus often symbolized as hexagons to specifically allude to the pyranose structure. I could still get behind the hexagon, though, especially because it's elongatable which is useful. The polygon is also a stronger distinction from the protein stadium than a circle.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60?email_source=notifications&email_token=AAOGZXOM5ZAMXDA3C57YXHLPXUF7JA5CNFSM4HB7QYV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWLYD6I#issuecomment-496468473, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOGZXPUGWXPTIJMINGQDJDPXUF7JANCNFSM4HB7QYVQ .

--


Raik Grünberg http://www.raiks.de/contact.html


jakebeal commented 5 years ago

Two notes regarding hexagon:

  1. Since ellipse will still be a alternative glyph for "any molecule", small molecules could still be legitimately represented with a circle.
  2. We need not talk about the location of the label, since in SBOL we never say anything more than that it should be "within, adjacent to, or otherwise clearly visually connected". The user already has freedom to do whatever they think is clearest, and a recommendation to keep glyphs of consistent size.
cjmyers commented 5 years ago

Suggest starting a new issue for hexagon for small molecule.

On May 28, 2019, at 5:32 AM, Jacob Beal notifications@github.com wrote:

Two notes regarding hexagon:

Since ellipse will still be a alternative glyph for "any molecule", small molecules could still be legitimately represented with a circle. We need not talk about the location of the label, since in SBOL we never say anything more than that it should be "within, adjacent to, or otherwise clearly visually connected". The user already has freedom to do whatever they think is clearest, and a recommendation to keep glyphs of consistent size. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60?email_source=notifications&email_token=AA2YH54H2ZVMBJGOGW2Q253PXUJ4FA5CNFSM4HB7QYV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWL2PEA#issuecomment-496478096, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2YH52L6I3NHRSPCKPUBW3PXUJ4FANCNFSM4HB7QYVQ.

jakebeal commented 5 years ago

I think the two are connected, as I wouldn't be comfortable with stadium for protein unless we also change small molecule.

rsc3 commented 5 years ago

There are many examples in the literature of both circle and hexagon being used for small molecules and inducers.

I think circle as small molecule works well, and would be the most clear in the AraC/L-+-arabinose example that Jake mentioned.In the rare case that we need to represent a small molecule interacting with a complicated regulatory map, e.g. tetracycline binding to several different types of transcription factors or a central metabolite feeding into multiple pathways, we could use the SBGN system for drawing the network.

On Tue, May 28, 2019 at 12:31 PM Jacob Beal notifications@github.com wrote:

I think the two are connected, as I wouldn't be comfortable with stadium for protein unless we also change small molecule.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60?email_source=notifications&email_token=AA5JFIWCGDXYTVAXG2BZFHDPXWB7LA5CNFSM4HB7QYV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWNF7FA#issuecomment-496656276, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5JFISCLXUU3AGYFM5A5A3PXWB7LANCNFSM4HB7QYVQ .

-- Sid The Third

jakebeal commented 5 years ago

@rsc3 Does this mean you support my proposal to allow circle (via "any molecule" ellipse) or hexagon (meaning only small molecule)?

JS3xton commented 5 years ago

I will also throw my support behind RECOMMENDING circles for small molecules.

At the risk of stirring things up, I would also endorse imposing size recommendations, i.e. SMALL circles for small molecules. With size recommendations, I could also envision accepting SMALL anything as an alternative for small molecule (e.g. small polygons).

To be even more offensive, I sometimes use LARGE circles for proteins (when I don't use the yeast glyph). I could probably get behind stadium, though.

E.g.:

jakebeal commented 5 years ago

@JS3xton How do you feel about small polygons like hexagons and pentagons?

JS3xton commented 5 years ago

@jakebeal I actually originally drew small molecules as small polygons that attempted to represent their chemical structure:

We ultimately decided to simplify them all to Small Circles to make diagrams that were otherwise very busy a little more readable:

As such, I think all of these options should be available, and it should be up to the user to pick what they need depending on what they're trying to communicate.

jakebeal commented 5 years ago

Interesting idea... I may try to write this up.

rsc3 commented 5 years ago

Yes, I support both proposals.

On Tue, May 28, 2019 at 4:39 PM Jacob Beal notifications@github.com wrote:

@rsc3 https://github.com/rsc3 Does this mean you support my proposal to allow circle (via "any molecule" ellipse) or hexagon (meaning only small molecule)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/60?email_source=notifications&email_token=AA5JFIRGW4XGIYF3STENZ6TPXW7B7A5CNFSM4HB7QYV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWNXYMY#issuecomment-496729139, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5JFIRIK2Z7WUUW53OJ7ELPXW7B7ANCNFSM4HB7QYVQ .

-- Sid The Third

cjmyers commented 5 years ago

By interesting idea, are you saying the three small circles in a triangle form? I think this is fine, but I would still want a single circle to be allowed to, and not as a special case of ellipse. I would actually prefer that a circle is explicitly disallowed for non-small molecules. Namely, I would like an ellipse defined to be NOT a circle for the purposes of SBOL Visual.

jakebeal commented 5 years ago

I have written up an attempt at finding this consensus from this thread and the discussion on the mailing list as SEP V017:

Please migrate discussion to the new issue.

If we are close enough to consensus, I may delay the SBOL Visual revision slightly to allow adoption of the new solution.