This repository includes the PolyAMiner-Bulk computational tool from 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'. Please cite our paper if you have used our computational tool, any of our machine learning models, or code snippets. Additionally, this repository is actively under development, so please kindly report any issues or feature requests.
In this package, we provide the following resources:
(1) Source code of PolyAMiner-Bulk
(2) Test scripts delineating key usage scenarios
(3) Fine-tuned CPAS-BERT models for both human and mouse model organisms.
If you have used PolyAMiner-Bulk in your research, please kindly cite the following publication:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
It is highly recommended that you use Anaconda to build a python virtual environment. Also, please make sure you have at least one NVIDIA GPU with Linux x86_64 Driver Version >= 410.48 (compatible with CUDA 10.0).
conda create -n cpasbert python=3.9
conda activate cpasbert
<!-- conda install pytorch torchvision cudatoolkit=10.0 -c pytorch -->
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=10.0 -c pytorch
git clone https://github.com/venkatajonnakuti/PolyAMiner-Bulk
cd PolyAMiner-Bulk/lib/DNABERT
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
python3 -m pip install --editable .
Rscript installPkgs.R
conda install pandas statsmodels
pip3 install -U scikit-learn
pip install pysam tokenizers
pip install tensorboard
conda install -c bioconda pyfasta
conda install -c bioconda gtfparse
conda install -c bioconda pybedtools
conda install -c bioconda pybigwig
conda install -c bioconda subread
conda install -c bioconda samtools
conda install -c bioconda bedtools
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
sudo apt install build-essential
conda install -c anaconda seaborn
sudo apt-get install gfortran
pip install deeptools
pip install pygenometracks==3.6
Also, install FeatureCounts v2.0.0! (Please do NOT install other versions of FeatureCounts!)
-mode = Run mode options: \'bam\' to start from mapped data, \'fastq\' to start from raw data' (string)
-index = Reference genome bowtie2 index. NOTE: Valid for -mode fastq ONLY! (string)
-d = Base directory of input fastq files. NOTE: Valid for -mode fastq ONLY! (string)
-o = Output directory; default = 'PolyAminer_OUT' (string)
-c1 = Comma-separated list of condition1 files. Full path for BAMs (index files are also expected) or just file names for fastq (string)
-c2 = Comma-separated list of condition2 files. Full path for BAMs (index files are also expected) or Just file names for fastq (string)
-s = Strand information. Use 0 for un-stranded, 1 for fwd-stranded, and 2 for rev-stranded (integer)
-fasta = Reference fasta file
-gtf = Reference gtf file
-pa =PolyA annotations file standard 6 column bed format (string)
-apriori_annotations = Enable pre-loading of a priori PolyASite 2.0 and PolyADB 3.0 annotations (boolean toggle)
!Note: In general, between these -pa and -apriori_annotations options, use -apriori_annotations.
-paired = Enable paired analyses where sample files are considered paired (i.e., pre-treatment vs post-treatment) for beta-binomial statistical test (boolean toggle)
Please refer to DEMO folder for PolyAMiner-Bulk demo command.
python3 /mnt/belinda_local/venkata/data/PolyAMiner-Bulk/PolyA-miner.py -mode bam \
-fasta /mnt/belinda_local/venkata/data/Index_Files/Human/GenomeFasta_GTF/GRCh38.primary_assembly.genome.fa \
-gtf /mnt/belinda_local/venkata/data/Index_Files/Human/GenomeFasta_GTF/gencode.v33.primary_assembly.annotation.gtf \
-p 20 -a 0.65 -outPrefix 3UTROnly -expNovel 1 -s 2 \
-o /mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/Demo_Results/Demo_3UTROnly_Softclipped+APriori_ReRun \
-c1 /mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/control1_.subset.sorted.bam,\
/mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/control2_.subset.sorted.bam,\
/mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/control3_.subset.sorted.bam \
-c2 /mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/treatment1_.subset.sorted.bam,\
/mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/treatment2_.subset.sorted.bam,\
/mnt/belinda_local/venkata/data/PolyAMiner-Bulk/Demo/treatment3_.subset.sorted.bam \
-ignore UTR5,CDS,Intron,UN -apriori_annotations -modelOrganism human \
-visualizeTopNum 10 -visualizeCondition1Name Control -visualizeCondition2Name Treatment
Important notes: (1) User will have to adjust the file location parameters to their specific file system
Q1) I am unable to install the gplots package and its corresponding dependencies when running "Rscript installPkgs.R". A1) This usually occurs when OS-level dependencies are missing. If on Linux, try running "apt-get install libblas-dev liblapack-dev prior to running "Rscript installPkgs.R".
Q2) I am unable to install the DESEQ2 package and its corresponding dependencies when running "Rscript installPkgs.R". A2) This usually occurs when OS-level dependencies are missing. Look through the error messages. Usually the console will request specific OS-level dependencies to be manually installed (see answer to Q1)
Please complete the following form if you have any questions or feedback: https://forms.gle/8rF4TZcPoS15PEsB8. We will update this readme with answers to the most frequently asked questions.