UC-Davis-molecular-computing / scadnano

Web application for designing DNA structures such as DNA origami.
https://scadnano.org
MIT License
21 stars 13 forks source link
cadnano dna-origami dna-sequences dna-structures

scadnano

scadnano ("scriptable-cadnano", source code repository here) is a program for designing synthetic DNA structures such as DNA origami. The scadnano project is developed and maintained by the UC Davis Molecular Computing group. Note that cadnano is a separate project, developed and maintained by the Douglas lab at UCSF.

If you find scadnano useful in a scientific project, please cite its associated paper:

scadnano: A browser-based, scriptable tool for designing DNA nanostructures.
David Doty, Benjamin L Lee, and Tristan Stérin.
DNA 2020: Proceedings of the 26th International Conference on DNA Computing and Molecular Programming
[ paper | BibTeX ]

Table of contents

Overview

The design of scadnano is similar to cadnano, specifically version 2, with three main differences:

1) scadnano runs entirely in the browser, with no installation required. Currently only Chrome, Firefox, and Edge are supported, with support for Safari planned in the future.

2) scadnano designs, while they can be edited manually in scadnano, can also be created and edited by a well-documented Python scripting library (help / API), to help automate tedious tasks.

3) The file format is easily human-readable (see example below), to help when debugging scripts or interfacing with other software.

This document explains how to use the web interface for viewing and manually editing designs. The documentation for the Python scripting package is here. This document does not assume any familiarity with cadnano, although some parts explain slight differences between cadnano and scadnano for the benefit of those who have used cadnano.

Please file bug reports and make feature requests as GitHub repository issues in the repositories for the web interface or the Python scripting library.

We will try to announce breaking changes (and possibly new features) under the GitHub releases page. The version numbers in this web interface repo and the Python library repo won't always advance at the same time, and sometimes a feature is supported in one before the other.

Following semantic versioning, version numbers are major.minor.patch, i.e., version 0.9.2 has minor version number 9. Prior to version 1.0.0, when a breaking change is made, this will increment the minor version (for example, going from 0.9.4 to 0.10.0). After version 1.0.0, breaking changes will increment the major version.

Reporting issues

Please report issues in the web interface at the scadnano web interface GitHub repository, and report issues in the Python scripting library at the scadnano Python package GitHub repository.

If you find an existing issue that you would like to see handled, please "vote" for it with the GitHub "thumbs up" on the top comment describing the issue:

Tutorial

A tutorial shows how to create a "standard" 24-helix DNA origami rectangle using the scadnano web interface.

WARNING: Save your work

It is strongly recommended that you frequently save your work by pressing the "Save" button to save your design to a .sc file on your computer.

Despite being run in a browser, currently this application is not really a "web app". Nothing is stored on a server; everything is running and being stored in your browser locally. In particular, your design is not automatically saved in an easily recoverable way. For convenience only, the application uses something called localStorage to store your most recent design in the browser. Thus, if you close your browser and re-start the application later, you should see the design you were working on before. There are some options to customize the schedule of this in the file menu.

However, relying on your browser's localStorage is not a safe or recommended method of saving your work. The storage format may change, or your browser may remove the contents of localStorage, and then your work would be lost. You should press the "Save" button (or Ctrl+S keyboard shortcut) to save your design to your local file system. Unfortunately, due to browser security restrictions on accessing the local file system, it is not possible to save your file automatically without further interaction; after pressing "Save", you will always be prompted to specify a filename to which to save.

Chrome automatically appends (1), (2), ... to the filename if it already exists in the directory, so repeatedly saving the file will change its name on your local filesystem every time. To disable this so that it uses the same filename every time you save, you can install the extension Downloads Overwrite Already Existing Files.

Security settings preventing saving: Some users have reported that in Chrome, they will be unable to save the file, which we have tracked as an issue. However, it's not straightforward to reproduce that error. If you have trouble with this, try setting your Chrome "Safe Browsing" settings from "Enhanced protection" to "Standard protection". This is in Chrome Settings (listed under ⋮ symbol in the top right), under "Security". (See the issue for a screenshot.)

Stable and development versions

The scadnano stable version matches what is on the main branch of the web interface code repository. The scadnano dev version matches what is on the dev branch of the web interface code repository.

Releases of the stable version are explained on the releases page. When issues are handled in a release, they are closed at the time the changes make their way to the main branch. If an issue is handled in the dev branch, the issue remains open, but you will see a comment that looks something like this: "dave-doty added a commit that referenced this issue 17 hours ago @dave-doty make width of File menu just enough to fit all entries on one line; fixes #339". These comments can help you decide if you want to use the latest version of scadnano (https://scadnano.org/dev), which has fixed an issue, before it makes its way to the stable version (https://scadnano.org).

Terminology

The main parts of the program are the side view on the left, and the main view in the center. The side view shows DNA helices "head on", with the interpretation that as you move to the right in the main view, this is like moving "into the screen" in the side view. The side view assumes that all helices are parallel. One can use different groups to specify groups of helices, such that all helices within a group are parallel, but different groups have different relative angles. The side view shows only one group at a time (but the default is to have a single group).

Annotated screenshot of scadnano web interface:

The screenshot is annotated with labels showing many of the terms used in scadnano's data model. It is instructive to see how that example design is represented as a .sc file (which is itself something called JSON format):

{
  "grid": "square",
  "helices": [
    { "max_offset": 48, "grid_position": [0,0] },
    { "max_offset": 48, "grid_position": [0,1] }
  ],
  "modifications_5p_in_design": {
    "/5Biosg/": {
      "display_text": "B",
      "vendor_code": "/5Biosg/",
      "location": "5'"
    }
  },
  "strands": [
    {
      "color": "#0066cc",
      "sequence": "AACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACGTAACG",
      "domains": [
        { "helix": 1, "forward": false, "start": 8, "end": 24, "deletions": [20] },
        { "helix": 0, "forward": true, "start": 8, "end": 40, "insertions": [[14,1], [26,2]] },
        { "loopout": 3 },
        { "helix": 1, "forward": false, "start": 24, "end": 40 }
      ],
      "is_scaffold": true
    },
    {
      "color": "#f74308",
      "sequence": "ACGTTACGTTACGTTTTACGTTACGTTACGTT",
      "domains": [
        { "helix": 1, "forward": true, "start": 8, "end": 24, "deletions": [20] },
        { "helix": 0, "forward": false, "start": 8, "end": 24, "insertions": [[14,1]] }
      ]
    },
    {
      "color": "#57bb00",
      "sequence": "ACGTTACGTTACGTTACGCGTTACGTTACGTTAC",
      "domains": [
        { "helix": 0, "forward": false, "start": 24, "end": 40, "insertions": [[26,2]] },
        { "helix": 1, "forward": true, "start": 24, "end": 40 }
      ],
      "5prime_modification": "/5Biosg/"
    }
  ]
}

The scripting library README shows Python code that produces this design.

Although it is not necessary to deal directly with the above JSON data, it is worthwhile to understand the data model it represents. This model is manipulated directly in the Python scripting library, and indirectly through the web interface. This section explains the meaning of the terms, although some more detail about them is given in subsequent sections explaining how the interface allows them to be edited.

A design consists of a list of helices and a list of strands. The order of the helices matters; if there are h helices, the helices are numbered 0 through h-1. This can be overridden by specifying a field called idx in each helix, but the default is to number them consecutively in order. (The strands also have an order, which generally doesn't matter, but it influences, for instance, which are drawn on top, so a strand later in the list will have its crossovers drawn over the top of earlier strands.) Each helix defines a set of integer offsets with a minimum and maximum; in the example above, the minimum and maximum for each helix are 0 and 48, respectively, so 48 total offsets are shown. Each offset is a position along the length of a helix where a DNA base of a strand can go.

One can also specify a grid type, a.k.a., lattice, one of the following types: square, hex, honeycomb, or none. Technically this is not associated to the whole design, but to a group, described below. However, there is a default group, and putting a grid as a top-level JSON field means it will be assigned to the default group. Helices in a grid have a two-integer grid position depicted in the side view. See the Python scripting documentation for more detail about the meaning of these positions. Helices without a grid have a position, a 3D real vector describing their x, y, z coordinates in units of nanometers.

A Helix may also define the angle roll in units of degrees, which defines the backbone rotation angle around the axis of the Helix (the Z-axis in scadnano). The interpretation is that roll 0 means the phosphate backbone of the strand that is forward=true on the helix is pointing straight up in the side and main views. Rotation is clockwise, at a default rate of 10.5 base pairs per 360 degrees. (Configurable through the geometry parameters.) It is possible to display helices in the main view, using groups, described below.

The position of helices in the main view depends on the grid position if a grid is used, and on the position otherwise. (Each grid position is interpreted as a position from a constrained set of possible positions.) They are listed from top to bottom in the order they appear in the sequence (unless the property helices_view_order is specified in the design to display them in a different order, though currently this can only be done in the scripting library).

Each strand is defined primarily by an ordered list of domains. Each domain is either a single-stranded loopout not associated to any helix, or it is a bound domain: a region of the strand that is contiguous on a single helix. The phrase is a bit misleading, since a bound domain is not necessarily bound to another strand, but the intention is for most of them to be bound, and for single-stranded regions usually to be represented by loopouts.

Each bound domain is specified by four mandatory properties: helix, direction (forward or reverse), start offset, and a larger end offset. As with common string/list indexing in programming languages, start is inclusive but end is exclusive. So for example, a bound domain with end=8 is adjacent to one with start=8. In the main view, forward bound domains are depicted on the top half of the helix, and reverse are on the bottom half. If a bound domain is forward, then start is the offset of its 5' end, and end-1 is the offset of its 3' end, otherwise these roles are reversed. There is implicitly a crossover between adjacent bound domains in a strand. Although the visual depiction of a loopout is similar to a crossover, loopouts are explicitly specified as a ( non-bound) domain in between two bound domains. Currently, two loopouts cannot be consecutive (and this will remain a requirement), and a loopout cannot be the first or last domain of a strand. This constraint may be relaxed in the future. For now, if you need to put a single-stranded overhang at the end of a strand, a good solution is to add a 5' or 3' modification whose idt_text (see description of modifications below) is the DNA sequence you want to assign.

Bound domains may have optional fields, notably deletions (called skips in cadnano) and insertions ( called loops in cadnano), explained below.

Each strand also has a color and a Boolean field is_scaffold. DNA origami designs have at least one strand that is a scaffold (but can have more than one), and a non-DNA-origami design is simply one in which every strand has is_scaffold = false. Unlike cadnano, a scaffold strand can have either direction on any helix. When there is at least one scaffold, all non-scaffold strands are called staples. The general idea behind DNA origami is that every staple strand binds only to a scaffold, never to another staple. Neither does any scaffold bind to another scaffold or itself. However, neither of these conventions is enforced by scadnano, and there are legitimate reasons to want non-scaffold strands to bind to each other (e.g., for DNA walkers or circuits on the surface of an origami). See the original paper for more detailed instructions for designing DNA origami.

A strand can have an optional DNA sequence. Of course, since the whole point of this software is to help design DNA structures, at some point a DNA sequence should be assigned to some of the strands. However, it is often best to mostly finalize the design before assigning a DNA sequence, which is why the field is optional. Many of the operations attempt to keep things consistent when modifying a design where some strands already have DNA sequences assigned, but in some cases it's not clear what to do. (e.g., what happens when a length-5 strand with sequence AACGT is extended to have a larger length? what DNA bases are assigned to the new offsets?)

Each helix belongs to a group. If not specified, all helices are in the group named "default_group". Most DNA designs, even those that have not all helices parallel, will typically have several groups of helices, where all helices in a group are parallel. The way that scadnano supports such designs is as follows. groups is a top-level field in the Design JSON, mapping a group name (a string) to a map describing the group. Each group has fields grid, position (itself a map with x, y, z fields), pitch, yaw, roll, and helices_view_order. If not specified, "default_group" is the name, with grid="none", helices_view_order assumed to be the indices of helices in this group in increasing order, and all other values are numbers equal to 0. Each helix is associated to a group via field group in the helix description, giving the name of the group. All helices in a group are translated by the group's position and rotated in the main view by the group's pitch angle. The yaw and roll parameters are not directly visualized in scadnano, but those fields can be edited and are used, for example, when exporting to oxDNA for 3D visualization. Their interpretation is explained in more detail in the Python package API documentation for HelixGroup.

Grid types

Each is described by a 2D (h,v) coordinate system. In all cases, h increases moving right in the side view and v increases moving down. (i.e., so-called screen coordinates, as opposed to Cartesian coordinates where v increases moving up)

The grid types square, honeycomb, hex all have integer coordinates. Examples are shown below. These images will look slightly different than the default side view, because they have no gaps between the helices. By default helices have a positive gap between them, which can be configured under the menu item "Set geometric parameters" explained elsewhere in this document.

square grid:

honeycomb grid (this matches the coordinate system used by cadnano for the honeycomb grid):

hexagonal grid (note that although the honeycomb grid is a subset of the hex grid, they use a different coordinate system; e.g., note the differing relative positions of (1,1) and (2,1) in each). This is called the "odd-q" coordinate system here: https://www.redblobgames.com/grids/hexagons/:

In contrast, the "none" grid type uses real numbers (not integers). One can think of this as the most general coordinate system, where square, hex, and honeycomb are special cases restricting the allowed real-valued positions. Below shows an example of converting the square grid helices above to the none grid, and then adding four more helices whose positions are not possible in any of the grid-based coordinate systems.

none grid:

Relation of grid_position and position to side and main view display

The main view and side views are 2D representations of a 3D object. The views display helices in the following way. First, each helix in a group is translated by its group's position.z and position.y values, and rotated clockwise by the group's pitch angle. The description below is relative to this translation and rotation.

Each helix has a 3D (x,y,z) position (grid_position is simply a special type of position, and a position is calculated from the grid_position if a grid is used.) The x and y coordinates are shown in the side view, with x increasing to the right and y increasing to the bottom (so-called "screen coordinates", which invert y compared to Cartesian coordinates).

In the main view, the horizontal direction is the z coordinate. The vertical direction, however, is not exactly the y coordinate, since this would simply pile helices on top of each other if their y coordinates were close or equal (which is common in a 3D design). Instead, the helices are displayed in order from top to bottom (by their index, or if specified, by the value helices_view_order in the DNA design, which can specify an alternate permutation). The vertical distance between adjacent helices is supposed to approximate the Euclidean x-y distance between the helices (i.e., the side view distance; the z distance is ignored in this calculation). If the helices are co-planar (such as a flat origami in the square grid, where all helices have the same x coordinate, or they all have the same y coordinate), then this will display the entire design to scale, with each helix appearing the correct relative distance from all others. Otherwise, the distances between pairs of helices with adjacent indices will be to scale.

[TODO: make some figures showing examples]

Navigation and control

Navigation: The side view and main view can both be navigated by using the mouse wheel/two-finger scroll gesture to zoom in and out, and clicking and dragging the background to pan. The zoom speed can be controlled under the View menu. It is currently unsupported to navigate entirely by keyboard or to navigate only by clicking.

Undo/redo: Pressing Ctrl+Z will undo the last action that changed the design. Pressing Shift+Ctrl+Z will redo it.

Right-click: Some items can be right-clicked (or Ctrl-clicked on Mac) to bring up a context menu. Note for cadnano users: Some features of cadnano are available in this way, for example assigning a DNA sequence or a color to a strand. For such items the browser's normal right-click is disabled. To see the browser's normal right-click menu on such items, press Shift + right-click.

Right-clicking on a crossover or loopout lets one toggle between a crossover or loopout or change the length of a loopout. Setting length to a positive integer converts to a loopout and setting a length of 0 converts a loopout to a crossover.

Menu

In both the side menu and the main menu, hovering the cursor over a most menu items brings up a tooltip explaining the menu item in more detail.

This refers to the menu at the top of the whole app. At the top of the side view, there is a side view menu, described below.

Side view menu

In both the side menu and the main menu, hovering the cursor over a most menu items brings up a tooltip explaining the menu item in more detail.

Edit modes

There are different edit modes available, shown on the right side of the screen. Currently most of them are mutually exclusive, so selecting one will unselect the others. However, a few can be on simultaneously. Each edit mode has a keyboard shortcut that can be used to toggle it, shown in parentheses in the application's display.

Both rope select and the selection box require the entire bounding box of the item to be contained in the box/polygon drawn to be selected. This can be counterintuitive. For example, here is the bounding box for a strand with loopouts:

Thus the following rope-select polygon would not select the strand, even though the strand appears to be contained in the rope select polygon, because the corners of the strand's bounding box go outside the rope select polygon:

Circular strands

It is possible to create circular strands, by ligating a strand's 5' and 3' ends, or by adding a crossover between a strand's 5' and 3' ends:

Although this is allowed while editing strands, it is encouraged eventually to make all strands linear ( non-circular) by adding appropriate nicks or deleting crossovers. At the current time, assigning and tracking DNA sequences in circular strands is not well-supported. In particular, there is not a straightforward way, when assigning DNA, to control where it begins on the strand. The reason this feature is allowed is that it helps to allow circular strands as intermediates that will eventually be made linear.

Assigning DNA

Right-clicking on a strand allows one to assign a DNA sequence to a strand (or remove it if assigned).

There are two options for assigning DNA sequences to a strand, both available via the right-click context menu on a strand: assign DNA and assign DNA complement from bound strands. The first option requires you to specify a DNA sequence. The second infers the DNA sequence from the complement of other DNA sequences already present in the design.

assign DNA: By default any strands bound to the assigned strand will have their sequences assigned to be the complement of the relevant region. Disabling this allows one to create intentional mismatches, for instance.

It is possible to assign DNA to a strand that already has DNA assigned to it. It will replace the previous DNA sequence. However, be careful in automatically assigning DNA complements to strands bound to this one. This is allowed, but only if the new sequence will not conflict with the old one, unless option "warn if assigning different sequence to bound strand" is disabled. If doing such an assignment would result in trying to write over a DNA base on a bound strand with a different base than it already has, this is not allowed. First, remove the DNA sequence from the strand and its bound strands, then assign a fresh DNA sequence. You can also disable the warning "warn if assigning different sequence to bound strand", in which case the bound strands will be silently overwritten with the new complementary sequence.

One reason to keep the warning enabled, but still assign DNA to a strand bound to another with DNA already assigned, is that complementary DNA can be built partially in stages. For example, if a strand s1 is connected to two others s2 and s3, then DNA can be assigned to s2, followed by s3. Any bases on s1 not bound to s2 (for instance, those bound to s3), after s2 has a sequence assigned to it, will receive the wildcard symbol ? as their "DNA base". Upon subsequently assigning a DNA sequence to s3, the complementary portions of s1 (which have a ?) will be overwritten with the appropriate DNA sequence, even if the warning is enabled. Thus the warning only concerns a concrete DNA base, one of A, C, G, or T, if it already exists and is about to be overwritten with a different base.

assign DNA complement from bound strands: The above description indicates how to assign a specific DNA sequence to a particular strand while automatically assigning the complementary DNA sequence to strands bound to it. But in some circumstances, you might have some strand(s) that already have DNA sequences assigned, and through some modification of the design, new strand(s) come to be bound to them that were not present in the design at the time the DNA sequence was assigned. This feature allows you to select that new strand (or many strands if you like), and tell it to receive the appropriate complementary DNA sequence. It is equivalent to iterating over each strand bound to the selected strand(s), selecting "Assign DNA" from the context menu, and selecting the option " assign complement to bound strands".

[TODO: make a figure showing this]

Exporting to cadnano

Files in the format recognized by cadnano v2 can be imported and exported from scadnano, in both the Python scripting library and the web interface. However, since the cadnano format is more limited, some scadnano features may be lost upon export. These are discussed here: https://scadnano-python-package.readthedocs.io/en/latest/#interoperability-cadnano-v2

How to design structures manually using scadnano

A full DNA origami design using a standard 7249-base M13mp18 scaffold uses ~200 staples, which are tedious to create manually. cadnano provides autostaple and autobreak utilities for quickly creating a large number of staple strands. However, there are fewer than a dozen different types of staples in the sense that once these types of staples exist in the design, all others can be created by copy/pasting them. We have found that the autostaple and autobreak tools are largely unnecessary in scadnano, since scadnano allows one to copy and paste strands (unlike cadnano), encouraging a more free-form method of creating large designs rapidly.

Copy/pasting speeds up this process even further. For instance, to create a vertical "column" of 32 staples in a 32-helix rectangle, one would create a staple, copy/paste it below, then use the Autopaste feature to repeatedly paste more copies below to create a full "column" of staples. Then this entire column can be selected, and autopaste can be used to fill in the rest of the design with those staples. See the tutorial for more details.

A standard DNA origami rectangle, for instance, can be created in about 10 minutes. One downside is that a complete novice, who has no idea where staples ought to go or what they should look like, does not have a default push-button way to create an initial design without using autostaple. However, numerous example designs are provided to learn what good staple design looks like.

See the tutorial for detailed instructions on creating a 24-helix DNA origami rectangle using the scadnano web interface.

Reset local settings

You may need to reset the local settings, in particular to remove a locally stored design that is causing a problem loading. For instructions, see the section "Reset local settings" here.

Running offline

It is possible to run scadnano offline, so that no internet connection is needed. To do this, you can follow the instructions for running a local server in the CONTRIBUTING document, which involves three steps:

Alternatively, you can run scadnano as a Docker container. This can be used for contributing to or running scadnano without having to install everything manually. This is confirmed working on Linux, but other platforms like Apple Silicon may recieve errors. Docker support is experimental and maintained by @headblockhead - please reference them in any issues encountered.

To run using Docker:

Performance tips

There are some performance issues that we don't fully understand. But in general, if you are working on a very large design, it is best to minimize how much is displayed/done. In particular, performance will be best if DNA sequence and mismatches are not shown. (This is true even if your design has no mismatches, because on each edit to the design, it is costly to check for new potential mismatches.) On very large designs (e.g., more than 10,000 base pairs), it can be a significant cost to write the entire design to localStorage on each edit. So you may want to disable this ( under the File menu) and save only infrequently.

Contributing

If you wish to contribute to scadnano, please see the CONTRIBUTING document to contribute to the scadnano web interface. There is also a CONTRIBUTING document for the scadnano Python package.