Teichlab / scg_lib_structs

Collections of library structure and sequence of popular single cell genomic methods
428 stars 89 forks source link

Single Cell Genomics Library Structure

Collections of library structure and sequence of popular single cell genomic methods (mainly scRNA-seq).

Before you start

Make sure you understand the basic configuration of the Illumina libraries, because most single cell sequencing methods are developed to be sequenced on the Illumina platforms. If you are not familiar with the Illumina sequencing libraries, click here to check some general information about Illumina library structures and the nature of library preparation.

The HTML pages listed below contain step-by-step procedures of how the libraries are generated experimentally. For the computational preprocessing pipelines for each method, please see this accompanying ReadTheDocs documentation. For the machine-readable format of the library structure, check seqspec.

How to use?

Click the following links to view the methods. Notes:

  1. Index1 (i7) is always sequenced using the bottom strand as template, regardless of the Illumina machine in use. That is why the index sequences are reverse complementary to the primer sequences.
  2. IMPORTANT: In a dual-index library, how index2 (i5) is sequenced differs from machines to machines. According to the Index Sequencing Guide from Illumina, Miseq, Hiseq2000/2500, MiniSeq (Rapid) and NovaSeq 6000 (v1.0) use the bottom strand as template (Forward Strand Workflow), which is why the index sequences are the same as the primer sequences in those machines. iSeq 100, MiniSeq, NextSeq, HiSeq X, HiSeq 3000/4000 and NovaSeq 6000 (v1.5) use the top strand as template (Reverse Complement Workflow), which is why the index sequences are reverse-complementary to the primer sequences in those machines. All methods listed below use iSeq 100, MiniSeq (Standard), NextSeq, HiSeq X, HiSeq 3000/4000 and NovaSeq 6000 (v1.5) as examples, because this configuration is more frequently used nowadays.

scRNA-seq technical comparisons

The basic chemistry is very similar, the main differences among those scRNA-seq methods are summarised in the table below. For a detailed discussion, check the text boxes from our review: From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture

Single cell isolation/capture Where RT happens 2nd strand synthesis Full-length cDNA synthesis Barcode addition Pooling before library Library amplification Gene coverage
10x Chromium Single Cell 3' Droplet In droplets TSO Yes Barcoded RT primers Yes PCR 3'
10x Chromium Single Cell 5' Droplet In droplets TSO Yes Barcoded TSO primers Yes PCR 5'
BD Rhapsody Nanowells In collection tubes Random priming and primer extension No Barcoded RT primers Yes PCR 3'
CEL-seq/CEL-seq2 FACS In 96w/384w wells RNase H and DNA pol I No Barcoded RT primers Yes In vitro transcription 3'
Drop-seq Droplet In collection tubes TSO Yes Barcoded RT primers Yes PCR 3'
Illumina Bio-Rad SureCell 3' WTA Droplet In droplets RNase H and DNA pol I No Barcoded RT primers Yes PCR 3'
inDrop Droplet In droplets RNase H and DNA pol I No Barcoded RT primers Yes In vitro transcription 3'
MARS-seq/MARS-seq2.0 FACS In 96w/384w wells RNase H and DNA pol I No Barcoded RT primers Yes In vitro transcription 3'
Microwell-seq Nanowells In collection tubes TSO Yes Barcoded RT primers Yes PCR 3'
Quartz-seq FACS In 96w/384w wells PolyA tailing and primer ligation Yes in principle Ligation of barcoded Truseq adapters No PCR 3'
Quartz-seq2 FACS In 96w/384w wells PolyA tailing and primer ligation Yes in principle Barcoded RT primers Yes PCR 3'
sci-RNA-seq Not needed In situ RNase H and DNA pol I No Barcoded RT primers and library PCR with barcoded primers Yes PCR 3'
sci-RNA-seq3 Not needed In situ RNase H and DNA pol I No Barcoded RT primers and hairpin adapters Yes PCR 3'
scifi-RNA-seq Droplet multiple cells In situ TSO Yes Barcoded RT primers and gel bead barcodes Yes PCR 3'
SCRB-seq/mcSCRB-seq FACS In 96w/384w wells TSO Yes Barcoded RT primers Yes PCR 3'
Seq-Well Nanowells In collection tubes TSO Yes Barcoded RT primers Yes PCR 3'
Seq-Well S3 Nanowells In collection tubes Random priming and primer extension No Barcoded RT primers Yes PCR 3'
SMART-seq/SMART-seq2/SMART-seq3 FACS or Fluidigm C1 In 96w/384w wells TSO Yes Library PCR with barcoded primers No PCR full-length
SPLiT-seq Not needed In situ TSO Yes Ligation of barcoded RT primers Yes PCR 3'
STRT-seq FACS In 96w/384w wells TSO Yes Barcoded TSO primers Yes PCR 5'
STRT-seq-C1 Fluidigm C1 In microfluidic chambers TSO Yes Barcoded Tn5 transposase No PCR 5'
STRT-seq-2i FACS or dilution In 9600w wells TSO Yes Barcoded PCR primers and Tn5 transposase Yes PCR 5'
Tang 2009 FACS or manual In 96w/384w wells PolyA tailing and primer extension Yes in principle Ligation of barcoded adaptors No PCR Biased to 3'

scATAC-seq technical comparisons

This is basically Table 1 from our scATAC-seq protocol: A plate-based single-cell ATAC-seq workflow for fast and robust profiling of chromatin accessibility

Tn5 and adaptors Staring cell number Tagmentation Single-cell/nucleus isolation Library amplification Barcode addition Throughput
sci-ATAC-seq/snATAC-seq Custom-made 500,000+ Bulk FACS or dilution PCR Tn5 + PCR barcodes 10,000
scTHS-seq Custom-made 500,000+ Bulk FACS or dilution IVT and PCR Tn5 + PCR barcodes 10,000
Plate_scATAC-seq and Pi-ATAC-seq Nextera 5,000+ Bulk FACS PCR PCR barcodes 1,000
Fluidigm C1 Nextera 4,000-20,000 Single cells Microfluidics PCR PCR barcodes 100
Takara ICELL8 Nextera 16,000 Single cells Microfluidics PCR PCR barcodes 1,000
10x Chromium Single Cell ATAC Nextera 800-15,000 Bulk Droplets PCR PCR barcodes 10,000
Bio-Rad dscATAC-seq Nextera 60,000+ Bulk Droplets PCR PCR barcodes 10,000
Bio-Rad dsciATAC-seq Custom-made 600,000+ Bulk Droplets PCR Tn5 + PCR barcodes 100,000

Motivation

I was a little bit bombarded with all the single cell methods and got completely lost. To help myself understand all of them and future troubleshooting, I start to perform an on-paper library preparation whenever I see a new single cell method.

Why bother?

Here I borrow from Feyman:

What I cannot create, I do not understand.


Citation

If you find this repository useful and would like to cite this resource, please consider citing this repo and the seqspec preprint together:

@misc{xi_chen_teichlabscg_lib_structs_2023,
    title = {Teichlab/scg\_lib\_structs: {Release} 26th {Oct} 2023},
    copyright = {Creative Commons Attribution 4.0 International},
    shorttitle = {Teichlab/scg\_lib\_structs},
    url = {https://zenodo.org/doi/10.5281/zenodo.10042390},
    abstract = {This is the first release to get a DOI so that people can cite the repo.},
    urldate = {2023-10-26},
    publisher = {Zenodo},
    author = {Xi Chen and Patrick Roelli and Darío Hereñú and Pontus Höjer and Tim Stuart},
    month = oct,
    year = {2023},
    doi = {10.5281/ZENODO.10042390},
}

@article{booeshaghi.pachter.Bioinformatics2024,
  title = {A Machine-Readable Specification for Genomics Assays},
  author = {Booeshaghi, Ali Sina and Chen, Xi and Pachter, Lior},
  editor = {Kendziorski, Christina},
  year = {2024},
  month = mar,
  journal = {Bioinformatics},
  volume = {40},
  number = {4},
  pages = {btae168},
  issn = {1367-4811},
  doi = {10.1093/bioinformatics/btae168},
  urldate = {2024-05-01},
  abstract = {Motivation: Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries.},
  copyright = {https://creativecommons.org/licenses/by/4.0/},
  langid = {english}
}

Feedback

I would be very happy if you go through them and let me know what you think. If you spot some errors/mistakes, or I've missed some key methods. Feel free to raise an issue in the GitHub repository, or contact me directly:

Xi Chen
chenx9@sustech.edu.cn