Lattice-Automation / seqviz

a JavaScript DNA, RNA, and protein sequence viewer
https://tools.latticeautomation.com/seqviz
MIT License
246 stars 53 forks source link

Cut sites display regions outside recognition sequence as if they were part of recognition sequence #270

Open jpsorensen-asimov opened 2 months ago

jpsorensen-asimov commented 2 months ago

Describe the bug

Cut sites show three important pieces of information in the seq viz sequence - the two cut locations and the sequence that is recognized by the enzyme.

In cases where the cut site extends outside the recognition sequence, the enzyme definitions in SeqViz "pad out" the recognition sequence with Ns. For example, the definition of the enzyme BbsI is:

  bbsi: {
    fcut: 8,
    name: "BbsI",
    rcut: 12,
    rseq: "GAAGACNNNNNN",
  },

Note that the part of the sequence that's actually recognized by the enzyme is "GAAGAC". However, because of the Ns padding the rseq in the enzyme definition, the entire range is shown inside the "recognition box" when this is rendered in SeqViz, suggesting that the characters after the "C" are part of the recognition site:

Screenshot 2024-09-19 at 11 26 02 AM

We've tried custom definitions of these enzymes which exclude the N's but it seems those are included intentionally - when the cut sites fall off the edge of a sequence, it causes the component to crash.

Expected behavior

I'd expect that the "recognition rectangle" is drawn around only the part of the sequence that is recognized by the enzyme - in this case, only around GAAGAC, not GAAGACACAGGG.

Screenshot 2024-09-19 at 12 50 57 PM

I'd expect this to be the case for any enzyme that's currently defined with leading or trailing N's. I know there are also some enzymes with non-N wildcards, or with Ns (or other degenerate nucleotides) in the middle of the recognition sequence.

For example:

  banii: {
    fcut: 5,
    name: "BanII",
    rcut: 1,
    rseq: "GRGCYC",
  },
  bgli: {
    fcut: 7,
    name: "BglI",
    rcut: 4,
    rseq: "GCCNNNNNGGC",
  },

I'd propose that it's reasonable not to attempt to handle these cases for the moment, and to include any degenerate nucleotides (other than N), or any interior Ns, as part of the displayed recognition sequence.

Screenshots

Inline

Your environment:

Observing the same behavior across multiple browsers & OSes