CRAFT is a computational pipeline that predicts circRNA sequence and molecular interactions with miRNAs and RBPs, along with their coding potential. CRAFT provides a comprehensive graphical visualization of the results, links to several knowledge databases, extensive functional enrichment analysis and combination of predictions for different circRNAs. CRAFT is a useful tool to help the user explore the potential regulatory networks involving the circRNAs of interest and generate hypotheses about the cooperation of circRNAs into the regulation of biological processes.
The Docker image saves you from the installation burden. A Docker image of CRAFT is available from DockerHub at https://hub.docker.com/r/annadalmolin/craft; just pull it with the command:
docker pull annadalmolin/craft:v1.0
Prepare your project directory with the following files:
_listbacksplice.txt: file with circRNA coordinates. The file format is a tab-separated text file, with circRNA backsplice coordinates in the first column and circRNA strand in the second. An example of _listbacksplice.txt is:
4:143543509-143543972 +
11:33286413-33287511 +
15:64499292-64500166 +
_pathfiles.txt: file with the relative paths for Ensembl annotation and genome files. The file format is a text file with a path written in each row, in the following order:
An example of _pathfiles.txt is:
/data/input/Homo_sapiens.GRCh38.104.gtf
/data/input/Homo_sapiens.GRCh38.dna.primary_assembly.fa
The gene annotation (in GTF format) and the genome sequence (in FASTA format) files must be downloaded by the user from Ensembl database and placed into the input/ directory contained in the project directory. Annotation and genome files for Homo sapiens (GRCh38) can be downloaded from http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/ and http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/, respectively.
params.txt: file with the parameters to be setted in CRAFT. The file format is a text file with a/more parameter/s written in each row, in the following order:
kind of prediction; it can be "M" for miRNA prediction, "R" for RBP prediction, "O" for ORF prediction, "MR", "MO", "RO" or "MRO" for a combination of the previous.
investigated species; it can be one of the species in miRBase database: hsa for Homo sapiens, mmu for Mus musculus, etc.
parameters for miRanda tool (optional); in a single row, they must be the _miRandascore and the _miRandaenergy, in order, separated by tab. The user must set or both parameters or neither of the two; default values are 80 (score) and -15 (energy).
parameters for beRBP tool (optional); in a single row, in order and separated by a tab, they must be the PWM/s and the RBP/s investigated. The syntax is: PWM RBP; multiple PWMs (separated by ", ") and associated RBP (separated by ", ") are also allowed. The default is all all, searching for all PWMs and RBPs included in beRBP database. The user must set both parameters or none of the two.
prefix of the genome and indexes downloaded from UCSC website; f.i. hg38 for Homo sapiens. The human genome file (f.i. hg38.fa.gz) can be downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/ . Index files can be obtained following the instructions reported in https://bioinfo.vanderbilt.edu/beRBP/download/beRBP.standalone.README.txt . Genome (.fa) and indexes (.00.idx, .01.idx, .02.idx, .nhr, .nin, .nsq, .shd) must be included in the input/ directory.
parameters for ORFfinder tool (optional); in order, separated by tab, the user must specify: the genetic code to use, the start codon to use, the minimal ORF length, whether to ignore nested ORFs and the strand in which putative ORFs are searched. The user must set all parameters or none of them. The allowed options for each parameter are:
parameters for the graphical output for a single circRNA investigated (optional, but advised); the default parameters are: _l=50000, QUANTILE1=”FALSE”, thr1=0.95, score_miRNA=120, energy_miRNA=-22, QUANTILE2=”FALSE”, thr2=0.95, dGduplex_miRNA=-20, dGopen_miRNA=-11, QUANTILE3=”FALSE”, thr3=0.9, voteFracRBP=0.15, orgdb="org.Hs.eg.db", meshdb="MeSH.Hsa.eg.db", symbol2eg="org.Hs.egSYMBOL2EG", eg2uniprot="org.Hs.egUNIPROT", org="hsapiens". The user must specify only the parameters to be changed with respect to the default, in a comma-separated list format; the parameter order does not matter. Available parameters:
parameters for the summary graphical output for all circRNAs investigated (optional, but advised); the default parameters are the same as the previous point. The user must specify only the parameters to be changed with respect to the default, in a comma-separated list format; the parameter order does not matter. Available parameters: the same as before, except for meshdb and org. It is advised to set point 7 and point 8 parameters with the same values.
An example of params.txt file is:
M
hsa
hg38
score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10
score_miRNA=125, energy_miRNA=-25, dGduplex_miRNA=-22, dGopen_miRNA=-10, voteFrac_RBP=0.3
and directory:
input/: directory containing the following files:
genome and annotation files from Ensembl database, and genome and indexes files from UCSC databases (see above)
_backsplice_genename.txt: file with circRNA gene names. It must be created by the user. The file format is a tab-separated text file, with circRNA backsplice in the first column and circRNA host gene name in the second; the official gene name has to be used. The header line is needed. An example of _backsplice_genename.txt is:
circ_id gene_names
4:143543509-143543972 SMARCA5
11:33286413-33287511 HIPK3
15:64499292-64500166 ZNF609
_AGO2_bindingsites.bed (optional): file with validated AGO2 binding sites. The file, in BED6 format, must have the following fields: chromosome, start genomic position (0-based), end genomic position, the string “AGO2_binding_site”, a dot, the strand. Keep attention to use the same genome reference version as that included in the input/ directory. An example of _AGO2_bindingsites.bed is:
4 143543521 143543542 AGO2_binding_site . +
4 143543530 143543559 AGO2_binding_site . +
4 143543562 143543607 AGO2_binding_site . +
The number of miRNA binding sites overlapped with AGO2 binding sites is written in the standard output. Check it in order to decide to keep AGO2 overlapping or re-running the analysis without this information (i.e. when very few sites are overlapping).
To run CRAFT from the Docker container use:
sudo docker run -it -v $(pwd):/data annadalmolin/craft:v1.0
All paths in _pathfiles.txt must be relative to the directory in the container where the volumes were mounted (f.i. _/data/input/filename, as detailed above).
If you want the container to give your user permissions, you need to set the owner id with "-u id -u
":
sudo docker run -u `id -u` -it -v $(pwd):/data annadalmolin/craft:v1.0
After CRAFT successful run end, you will find the following new directories in your project directory:
__sequence_extraction/__
The output files for the sequence reconstruction step are:
All these files are found in the _functionalpredictions/ directory.
__functional_predictions/__
The output files of functional prediction step are (the final output of each tool is highlighted in bold):
_miRNAdetection/:
_RBPdetection/:
_ORFdetection/:
__graphical_output/__
The output files for the graphical output step are:
If circRNA sequences are available to the user, CRAFT doesn’t perform the sequence reconstruction step. So, to let CRAFT use the provided circRNA sequences, the user must follow these steps:
if the user wants to filter for miRNA binding sites overlapped with AGO2 binding sites, he/she must also add the file _region_to_extract1.bed to _sequenceextraction/. The file in BED6 format must have six tab-separated columns: circRNA chromosome, 0-based start position, 1-based end position, backsplice coordinates, score, strand. Each row represents a single separated region from which the circRNA is arranged (exon, intron, part of exon/intron or intergenic region). An example of _region_to_extract1.bed is:
11 33286412 33287511 11:33286412-33287511 . +
15 64499291 64500166 15:64499291-64500166 . +
4 143543508 143543657 4:143543508-143543972 . +
4 143543852 143543972 4:143543508-143543972 . +
If you use CRAFT for your analysis, please add the following citation to your references:
Dal Molin A, Gaffo E, Difilippo V, Buratin A, Tretti Parenzan C, Bresolin S, Bortoluzzi S, CRAFT: a bioinformatics software for custom prediction of circular RNA functions, Brief Bioinform. 2022 Mar 10;23(2):bbab601.