SAMTOMINDUSTRYS / monsterlab

Monster Lab: teaching the science of sequencing with Lego
https://monster-lab.org/
13 stars 3 forks source link

Document colour encoding (bases for red, green, yellow and blue) #22

Open peterjc opened 5 years ago

peterjc commented 5 years ago

The rival project Blocksford Brickopore LEGO sequencer uses red (“T”), green (“A”), yellow (“G”) and blue (“C”) LEGO bricks - as documented here:

http://www.earlham.ac.uk/articles/earlham-institute-lego-sequencer

My immediate concern was did they invent a new encoding (I'm having FASTQ flashbacks here)?

Sadly it seems yes, since according to https://samnicholls.net/2017/03/15/lego-sequencer/ you used blue for "T".

Further digging, https://imgur.com/gallery/4O8r4 says "It would later turn out that we could distinguish yellow using the Clear channel, as it is much more reflective these values dwarf the rest of the values. Red, blue and green can easily be distinguished by checking the which of the RGB channels is currently giving the highest reading."

I could find the relevant part of your code here, mapping an RGB and Clear value to DNA base letters:

https://github.com/SAMTOMINDUSTRYS/monsterlab/blob/master/software/samseq.ino#L132

  if(clear > 10000){
    Serial.print("C");
  }
  else {
    if(green > blue && green > red){
        Serial.print("A");
    }
    else if(red > green && red > blue){
        Serial.print("G");
    }
    else if(blue > green){
        Serial.print("T");
    }
    else{
        Serial.print("N");
    }
  }

I think that means yellow "C", green "A", red "G", blue "T", other "N".

(On a more serious note, I presume the printed sheets etc deliberately do not give away the colour key so as not to spoil the surprise)

SamStudio8 commented 5 years ago

I think we took our colour codings from the ones used by Tablet, but we should make a note of them in the README somewhere. Though you're right, the monster sheets are intended to not provide a key to prevent kids gaming the system! Although, at future workshops we'll probably get participants to look at the Monster Lab Zoo and see what bases encode what phenotype.

Of course, it's likely that Brickopore may have chosen an alternative encoding strategy to avoid infringing our intellectual property when building their own sequencer... ;)

peterjc commented 5 years ago

Yes, the colour scheme does seem to match Tablet having checked a couple of screenshots: https://ics.hutton.ac.uk/tablet/tablet-screenshots/

I expect @imilne would be able to tell us where that scheme originally came from (as I assume it followed an even older convention).

gringer commented 5 years ago

The convention for electrophoresis colours, as used in trace plots, is red (“T”), green (“A”), yellow (“G”) and blue (“C”), e.g. see here. This is the closest to a DNA colour convention that I have found, and what I prefer to use in all my code.

imilne commented 5 years ago

Tablet would have taken its colours from Flapjack, which was a follow on from TOPALi (http://www.topali.org/topali-v1/), the first program I ever wrote doing that kind of visualization. But you're talking ~2002 - I've no recollection now if I copied those colours from something else around at the time (eg Bioedit or Jalview) or just picked ones I thought looked pretty :)

peterjc commented 5 years ago

Chromas (which is the oldest Sanger capillary sequencing tool I've used) does red "T", green "A", black "G" and blue "C" http://technelysium.com.au/wp/chromas/ - which with apparently common yellow/black substitution matches Blocksford Brickopore LEGO sequencer.

I wonder if we can find some early references for precedents establishing these kinds of convention? Might even make a nice short review paper somewhere, if no one has done that already?

peterjc commented 5 years ago

From the original JalView documentation they describe various protein colour schemes: http://www.jalview.org/version118/documentation.html#colour

P.S. The citation for the Taylor protein colour scheme is indeed an entertaining read! It does not mention nucleic acids though.

W R Taylor. Residual colours: a proposal for aminochromography. Protein Engineering, Vol 10 , 743-746 (1997) https://dx.doi.org/10.1093/protein/10.7.743

peterjc commented 5 years ago

BioEdit screenshots here: http://www.mbio.ncsu.edu/BioEdit/screenshots.html

Again, red "T" or "U", green "A", black "G" and blue "C"

peterjc commented 5 years ago

Iain Macaulay on Twitter https://twitter.com/whatchamacaulay/status/1066265073142964224 said:

It's based on the emission spectra of the fluorescent dyes used in ABI gel sequencers - yellow was changed to black as it's hard to see a yellow trace on a white electropherogram (but not on a black gel background). image

i.e. Color for raw data on ABI Prism 310 electropherogram (using black for G) and ABI Prism 377 gel image (using yellow for G), and both using red T, green A, blue C.

I wonder when that documentation was published? That might be one of the earliest citable sources for this convention (as very sensibly used by the Blocksford Brickopore LEGO sequencer).

gringer commented 5 years ago

After 1986, anyway. ABI used A:Fluorescein / FITC (520nm; green emission), T:NBD (550nm; green/yellow emission), G:Tetramethylrhodamine (580nm; yellow emission) and C:Texas Red (610nm; orange emission) in what looks like it could be their first fluorescence sequencing paper:

https://doi.org/10.1038/321674a0

Deep blue was avoided at the time because of the potential for scattering and overlap with fluorescence background (according to the paper). It seems like that unease has since been overcome.

peterjc commented 5 years ago

Using red, green, greeny-yellow, and yellow Lego blocks just wouldn't have the same visual impact (and would be more expensive to source too). Blue is better :)

peterjc commented 5 years ago

Update from Jim Proctor (@foreveremain on GitHub) on Twitter https://twitter.com/foreveremain/status/1067076249745539072 saying:

AM Waterhouse created Jalview's nucleotide colours in Jan 2005, based on TOPALi's topali.org - so best ask Frank Wright and Iain Milne ;) FWIW see other schemes in Figure 2 of nature.com/articles/nmeth… (paywalled but figure is visible)

Here is Figure 2(c) from https://doi.org/10.1038/nmeth.1434 Proctor et al. (2010)

screenshot 2018-11-26 15 24 30

Amusingly none of the four schemes shown using four colours for the four different bases matches the ABI electrophoresis/gel colours.