DeepSelectNet is an improved 1D ResNet based model to classify Oxford Nanopore raw electrical signals as target or non-target for Read-Until sequence enrichment or depletion. DeepSelectNet provides enhanced model performances.
Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of ∼77%-97% (average accuracy <89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization.
Pre-print: https://www.biorxiv.org/content/10.1101/2022.10.24.513498v1
1) Open a terminal in the root directory of the code repository.
2) Create a python3 virtual environment named deepselectenv
python3 -m venv deepselectenv
3) Use the following command to activate the virtual environment created.
source deepselectenv/bin/activate
4) Install required packages in the virtual enviroment.
pip install -r requirements.txt
5) [Optional] To Lave the environment when not in use.
deactivate
Preprocess the slow5 files into numpy dumps so that they can be used for training
python scripts/preprocessor.py -pos_s5 <pos_slow5> -neg_s5 <neg_slow5> -b 20000 -c 1500 sco 4 -mad 3 -o <output_dir>
Note:
Train the model for given dataset using dumped numpy arrays
python scripts/trainer.py -d <npy_dump_dir> -s 0.7 -k 5 -e 200 -o <output_dir>
Predict the class of unseen slow5 reads with trained model
python scripts/inference.py -model <saved_model_dir> -s5 <slow5_dir> -lb 1 -mad 3 -o <output_dir>