Module with utility functions to process CRISPR-based screens and method to correct gene independent copy-number effects.
Crispy uses Sklearn implementation of Gaussian Process Regression, fitting each sample independently.
Install pybedtools
and then install Crispy
conda install -c bioconda pybedtools
pip install cy
Support to library imports:
from crispy.CRISPRData import Library
# Master Library, standardised assembly of KosukeYusa V1.1, Avana, Brunello and TKOv3
# CRISPR-Cas9 libraries.
master_lib = Library.load_library("MasterLib_v1.csv.gz")
# Genome-wide minimal CRISPR-Cas9 library.
minimal_lib = Library.load_library("MinLibCas9.csv.gz")
# Some of the most broadly adopted CRISPR-Cas9 libraries:
# 'Avana_v1.csv.gz', 'Brunello_v1.csv.gz', 'GeCKO_v2.csv.gz', 'Manjunath_Wu_v1.csv.gz',
# 'TKOv3.csv.gz', 'Yusa_v1.1.csv.gz'
brunello_lib = Library.load_library("Brunello_v1.csv.gz")
Select sgRNAs (across multiple CRISPR-Cas9 libraries) for a given gene:
from crispy.GuideSelection import GuideSelection
# sgRNA selection class
gselection = GuideSelection()
# Select 5 optimal sgRNAs for MCL1 across multiple libraries
gene_guides = gselection.select_sgrnas(
"MCL1", n_guides=5, offtarget=[1, 0], jacks_thres=1, ruleset2_thres=.4
)
# Perform different rounds of sgRNA selection with increasingly relaxed efficiency thresholds
gene_guides = gselection.selection_rounds("TRIM49", n_guides=5, do_amber_round=True, do_red_round=True)
Copy-number correction:
import crispy as cy
import matplotlib.pyplot as plt
from crispy.CRISPRData import ReadCounts, Library
"""
Import sample data
"""
rawcounts, copynumber = cy.Utils.get_example_data()
"""
Import CRISPR-Cas9 library
Important:
Library has to have the following columns: "Chr", "Start", "End", "Approved_Symbol"
Library and segments have to have consistent "Chr" formating: "Chr1" or "chr1" or "1"
Gurantee that "Start" and "End" columns are int
"""
lib = Library.load_library("Yusa_v1.1.csv.gz")
lib = lib.rename(
columns=dict(start="Start", end="End", chr="Chr", Gene="Approved_Symbol")
).dropna(subset=["Chr", "Start", "End"])
lib["Chr"] = "chr" + lib["Chr"]
lib["Start"] = lib["Start"].astype(int)
lib["End"] = lib["End"].astype(int)
"""
Calculate fold-change
"""
plasmids = ["ERS717283"]
rawcounts = ReadCounts(rawcounts).remove_low_counts(plasmids)
sgrna_fc = rawcounts.norm_rpm().foldchange(plasmids)
"""
Correct CRISPR-Cas9 sgRNA fold changes
"""
crispy = cy.Crispy(
sgrna_fc=sgrna_fc.mean(1), copy_number=copynumber, library=lib.loc[sgrna_fc.index]
)
# Fold-changes and correction integrated funciton.
# Output is a modified/expanded BED formated data-frame with sgRNA and segments information
# n_sgrna: represents the minimum number of sgRNAs required per segment to consider in the fit.
# Recomended default values range between 4-10.
bed_df = crispy.correct(n_sgrna=10)
print(bed_df.head())
# Gaussian Process Regression is stored
crispy.gpr.plot(x_feature="ratio", y_feature="fold_change")
plt.show()
Developed at the Wellcome Sanger Institue (2017-2020).
For citation please refer to: