Cut sites show three important pieces of information in the seq viz sequence - the two cut locations and the sequence that is recognized by the enzyme.
In cases where the cut site extends outside the recognition sequence, the enzyme definitions in SeqViz "pad out" the recognition sequence with Ns. For example, the definition of the enzyme BbsI is:
Note that the part of the sequence that's actually recognized by the enzyme is "GAAGAC". However, because of the Ns padding the rseq in the enzyme definition, the entire range is shown inside the "recognition box" when this is rendered in SeqViz, suggesting that the characters after the "C" are part of the recognition site:
We've tried custom definitions of these enzymes which exclude the N's but it seems those are included intentionally - when the cut sites fall off the edge of a sequence, it causes the component to crash.
Expected behavior
I'd expect that the "recognition rectangle" is drawn around only the part of the sequence that is recognized by the enzyme - in this case, only around GAAGAC, not GAAGACACAGGG.
I'd expect this to be the case for any enzyme that's currently defined with leading or trailing N's. I know there are also some enzymes with non-N wildcards, or with Ns (or other degenerate nucleotides) in the middle of the recognition sequence.
I'd propose that it's reasonable not to attempt to handle these cases for the moment, and to include any degenerate nucleotides (other than N), or any interior Ns, as part of the displayed recognition sequence.
Screenshots
Inline
Your environment:
Observing the same behavior across multiple browsers & OSes
Describe the bug
Cut sites show three important pieces of information in the seq viz sequence - the two cut locations and the sequence that is recognized by the enzyme.
In cases where the cut site extends outside the recognition sequence, the enzyme definitions in SeqViz "pad out" the recognition sequence with Ns. For example, the definition of the enzyme BbsI is:
Note that the part of the sequence that's actually recognized by the enzyme is "GAAGAC". However, because of the Ns padding the rseq in the enzyme definition, the entire range is shown inside the "recognition box" when this is rendered in SeqViz, suggesting that the characters after the "C" are part of the recognition site:
We've tried custom definitions of these enzymes which exclude the N's but it seems those are included intentionally - when the cut sites fall off the edge of a sequence, it causes the component to crash.
Expected behavior
I'd expect that the "recognition rectangle" is drawn around only the part of the sequence that is recognized by the enzyme - in this case, only around GAAGAC, not GAAGACACAGGG.
I'd expect this to be the case for any enzyme that's currently defined with leading or trailing N's. I know there are also some enzymes with non-N wildcards, or with Ns (or other degenerate nucleotides) in the middle of the recognition sequence.
For example:
I'd propose that it's reasonable not to attempt to handle these cases for the moment, and to include any degenerate nucleotides (other than N), or any interior Ns, as part of the displayed recognition sequence.
Screenshots
Inline
Your environment:
Observing the same behavior across multiple browsers & OSes