DELi (DNA-Encoded-Library informatics) is the software suite used by the CICBDD to do their in house DEL analysis. It incorporates the whole pipeline post base-calling/sequencing including
DELi is written in python for 2 reason:
It is true that DELi would likly be faster as a compiled C++ or Rust program, but we utilize Numba to help keep things fast enough to be usable for most academic settings. DELi can handle calling around 10 million reads in 1-2 hours on a (built in 2022) 32 core machine. If you have billions of reads, you DELi will need more cores to have good runtimes
We are interested in writing the calling code of DELi into a faster language, probably Rust cause it sounds like a fun language to learn. But we (the orginal authors) are too bust trying to get PhDs to spend enough time to write something good. If your are interested in do it for us, please reach out.
To install DELi,
first clone the repo and then set up a virtual environment
(conda or venv, both will work) by installing the requirements.txt
file:
git clone git@github.com:Popov-Lab-UNC/CICBDD-DEL-Pipeline.git
cd CICBDD-DEL-Pipeline
conda create -n deli -f ./requirements.txt -y
conda activate deli
NOTE: DELi is built to run on linux or windows, it has not been tested on MacOS, but in theory should be work there as well
DELi is built will multiprocessing as the matching and barcode calling is a CPU hungry process.
It is recommended that you install DELi on a computer with at least 20 cores.
Currently, the Popov lab uses a machine with a 32 core Threadripper (glassfrog.dhcp.unc.edu
)
While DELi can be run in discrete steps,
it is recommended that you run the entire pipeline in a single call to deli_run.py
.
This command will run through all step of converting a .fastq
file
(post base-calling, see dorado)
into the cube file that can be analyzed in DataWarrior or Spotfire.
Details about the settings can be obtained with python3 deli_run.py -h
positional arguments:
input input fastq file with reads to be searched for matches
options:
-h, --help show this help message and exit
--make the 'make' ID defining the makeup of the library
-i INDEX [INDEX ...], --index INDEX [INDEX ...]
index(s) included in the DEL selection being called; pass "all" for all indexes
-l LIBRARY [LIBRARY ...], --library LIBRARY [LIBRARY ...]
library(s) included in the DEL selection being called; pass "all" for all library(s)
-t CONTROL, --control CONTROL
index of selection that is the negative control
-o OUTDIR, --outdir OUTDIR
directory to write the output files to
-s, --save-match whether to save the matches as a file
-r, --recursive recursively loop through all file in a directory
-m {full,minimum,double,single}, --mode {full,minimum,double,single}
which barcode query to use when searching for matches, see readme.md for more details
-e ERROR, --error ERROR
how many edit distance errors you are willing to allow when attempting a match
-c, --compliment also search for the reverse compliment of the DEL
-p PREFIX, --prefix PREFIX
prefix to add to the filename of output called csv file(s)
-u, --umi-cluster conduct a UMI clustering to further denoise results
--strict use a more strict alignment and pattern match that uses the closing primer post UMI
-n, --normalize normalize the DEL_ID sequence counts by the control (requires -t/--control is set)
--monosynthon conduct a monosynthon analysis on all generated cubes
--disynthon conduct a monosynthon analysis on all generated cubes
-j N_WORKERS, --n-workers N_WORKERS
number of worked to use for calling
--index_name path to json file containing a dictionary mapping passed index to selection names
--debug enable debug logging
--print print updating status to console
While the settings allow for a fair amount of customization, it is recommended that you use the default settings from the example below to get the best results. This example assumes that we are calling a run that includes libraries 'DEL004' and 'DEL005' (see the library json for list of libraries), and indexes 'index5', 'index6', and 'index10' (see the index json for list of indexes) with 'index10' being the control (bead only) selection
python3 deli_run.py <PATH/TO/MY/FASTQ> -i index5 index6 index10 -l DEL004 DEL005 -o <PATH/TO/OUTPUT/DIR> -s -m single -e 3 -c -p <MY_PREFIX> -n -t index10 --disynthon --debug --print --index_name <PATH/TO/INDEX/JSON>
This will produce the desired cube files in the output-dir
.
There will be a single file for each library, each containing all indexes.
There will also be a file for each index, each containing all libraries.