bedapub / splicekit

splicekit: an integrative toolkit for splicing analysis from short-read RNA-seq
16 stars 4 forks source link

splicekit: an integrative toolkit for splicing analysis from short-read RNA-seq

splicekit is a modular platform for splicing analysis from short-read RNA-seq datasets. The platform also integrates an JBrowse2 instance, pybio for genomic operations and scanRBP for RNA-protein binding studies. The whole analysis is self-contained (one single folder) and the platform is written in Python, in a modular way.

Check a short video presentation about splicekit (poster) at ECCB 2023 on Youtube:

Quick start

The easiest way to install splicekit is to simply run:

$ pip install splicekit

Note that on some systems, pip is installing the executable scripts under ~/.local/bin. However this folder is not in the PATH which will result in command not found if you try to run $ splicekit on the command line. To fix this, please execute:

export PATH="$PATH:~/.local/bin"

Another suggestion is to install inside a virtual environment (using virtualenv).

Installing splicekit directly from the GitHub repository ``` pip install git+https://github.com/bedapub/splicekit.git@main ```
If you already have aligned reads in BAM files All you need is `samples.tab` (note that this is a TAB delimited file) and `splicekit.config` in one folder (check [datasets](datasets) for examples). You can easily download and prepare the reference genome (e.g. `$ pybio genome homo_sapiens`). Finally run `$ splicekit process` (inside the folder with `samples.tab` and `splicekit.config`). Easiest is to check [datasets](datasets) examples to see how the above files look like and also to check scripts if you need to map reads from FASTQ files with `pybio`.

Documentation

Changelog

v0.6: released in April 2024

Past changenotes (click to view) v0.4.9: released in November 2023 * added rMATS analysis for splicing events * added Docker container that can be directly imported to singularity via ghcr.io * fixed dependencies * other small fixes v0.4: released in May 2023 * added singularity container with all dependencies * added local integrated JBrowse2 * cluster or desktop runs * scanRBP and bootstrap analysis of RNA-protein binding * further development and integration with pybio * extended documentation of concepts, analysis and results v0.3: released in January 2023 (click to show details) * re-coded junction analysis * independent junctions parsing from provided bam files * master table of all junctions in the samples of the analyzed project, including novel junctions (refseq/ensembl non-annotated) * clustering by logFC of pairwise-comparisons with dendrogram: junction, exon and gene levels (clusterlogfc module) * added *first_exon* annotation for junctions touching annotated first exons of transcripts * extended documentation of concepts, analysis and results v0.2: released in October 2022 * software architecture restructure with python modules * filtering of lowly expressed features by edgeR * DonJuan analysis (junction-anchor analysis) * more advanced motif analysis with DREME * filtering regulated junctions with regulated donors v0.1: released in July 2022 * initial version of splicekit * parsing of junction and exon counts * computing edgeR analysis from count tables and producing a results file with direct links to JBrowse2 * basic motif analysis

Citing and Contact

If you find splicekit useful in your work and research, please cite:

Rot, G., Wehling, A., Schmucki, R., Berntenis, N., Zhang, J. D., & Ebeling, M. (2024)
splicekit : an integrative toolkit for splicing analysis from short-read RNA-seq
Bioinformatics Advances, 4(1). https://doi.org/10.1093/bioadv/vbae121

In case of questions, issues and other ideas, please use the GitHub Issues or write directly to Gregor Rot.