RdRpBin can identify and classify the RNA virus reads in metagenomic data. It uses RNA-dependent RNA polymerase gene as marker gene, and combines alignment-based strategies and graph-based learning models to do viral composition analysis and novel RNA virus discovery in metagenomic data.
The input file can be fasta
or fastq
format which contains the sequencing reads. Please note that since the software generates temporary data in running, please make sure there is enough free space on your hard disk, about 2 times the size of the input file.
Recommend using Anaconda to install the following packages:
The Anaconda environment has been save in environment.yml
, you can use following command to install the environment:
conda env create -f environment.yml
conda activate RdRpBin
Download the reference dataset and taxonomy files from OneDrive (or 百度网盘/Baidu Netdisk code: 5gv5) and uncompress them in the same directory with main.py
.
Create an empty directory <input_dir>
and put the sequencing reads file <input_reads>
into this directory.
Run the main.py
script
python main.py <path of input_reads>
Optional arguments:
-f, --format: the format of the input file. Default: fasta
.
-t, --thread: the number of threads. Default: 1.
--learning_rate: the learning rate of GCN. Default: 0.01.
--epochs: the number of GCN training epochs. Default: 50.
--hidden: the size of the hidden vector. Default: 64.
--weight_decay: the weight decay parameter. Default: 5e-4.
--no_gcn: run RdRpBin without running GCN. (This argument can reduce the running time with a little decrease in recall). Default: no.
--force_cpu: Run RdRpBin using CPU. Default: False
The identified RdRp reads will save in <input_dir>/RdRp_reads
.
python main.py example/test.fasta -t 6
Tang, X., Shang, J., & Sun, Y. (2022). RdRp-based sensitive taxonomic classification of RNA viruses for metagenomic data. Briefings in bioinformatics, 23(2), bbac011.