BIMIB-DISCo / J-Space.jl

J-SPACE is a Julia package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells.
Other
6 stars 1 forks source link
cancer-evolution computational-biology simulation

J-SPACE

POSSIBLE PROBLEMS

Since the library GLMakie uses the GPU, a possible error on virtual machines is the following:

 LoadError: InitError: Exception[GLFW.GLFWError(GLFW.PLATFORM_ERROR, "X11: Failed to open display localhost:20.0"), ErrorException("glfwInit failed")]

INTRODUCTION

J-SPACE is a Julia package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells. Firstly, the software simulates the spatial dynamics of the cells as a continuous-time multi-type birth-death stochastic process on a graph employing different rules of interaction and an optimised Gillespie algorithm. After mimicking a spatial sampling of the tumour cells, J-SPACE returns the phylogenetic tree of the sample and simulates molecular evolution of the genome under the infinite-site models or a set of different substitution models. Ther is also the possibility of include indels. Finally, employing ART, J-SPACE generates the synthetic single-end, paired-/mate-pair end reads of the next-generation sequencing platforms.

Spatial clonal dynamics

In J-SPACE the dynamics of the spatio-temporal evolution of a tumour is modelled by a stochastic multi-type Birth-Death process over an arbitrary graph. J-SPACE could generate by itself a 2D or 3D regular lattice. In addition, it is possible to give as input any graph as an adjacency matrix ( an example of the format needed for this matrix is given in path "Example_adj_matrix", it must is symmetric and with only values(0,1) separeted from space between them).
In this part, the user can tune the birth rate of wild type cells, the death rate of the cells, the rate of migration of cells (not tested), the rule of contact between cells (to simulate different mechanical interactions), the probability to develop a driver mutation per division, and the average birth rate advantage of a driver mutation. Additionally, there is the possibility to performe an excision by specifying the timing and the ratio of cells that will die of the event associated. If the user want a specific the clonal dynamics (i.e., Tree_Driver_Configure = 1), it is possible to indicate the edge list representing the mutational tree of drivers and the path where this file is supplied in txt (the parameter of the config file edgelist_treedriver ). In this case the user should also specify the birth rate of each subpopulation and the path where this file is supplied in txt (the parameter of the config file driver_birth_rates ). For example, a linear tree with tree drivers is described by the following:

Driver_1 Driver_2

Driver_2 Driver_3

Driver_3 Driver_4

The file with the birth rate must have the following format:

Driver_1 0.2

Driver_2 0.4

Driver_3 0.5

Driver_4 0.6

In this case J-SPACE accept only the events that respect the mutational tree given.

Note that every rate inserted in J-SPACE must have the same unit of time both for the spatial dynamics and the molecular evolution.

Molecular evolution

J-SPACE simulate the evolution of the sequence of the sample after the simulation of the clonal dynamics. The user can sample the whole population or a subset of it, and the J-SPACE evaluate the phylogenetic tree of the samples. This GT tree is returned as a Newick file in the folder specified by the variable "/path_to_save_files/" .

The molecular evolution of an ancestral genome (which can be given by the user as FASTA file or generated randomly) is simulated along the sampled tree via the Doob-Gillespie algorithm. The user can use an infinite-site model to have fast simulations of situations where the genome is long, the mutational rate is very low (e.g.,<10^-8 substitution for unit of time per site), and the total simulated time is long.

In the case of finite-site models, J-SPACE takes as input the matrix of instantaneous rates for different substitution models: JC69, F81, K80, HKY85, TN93, and K81. We suppose that the indels have a size distributed as a Lavalette distribution.

The user can also generate a custom time-dependent substitution model based on a linear combination of the Mutational signatures of the COSMIC database. In this case the user should provide the list of labels of the desired SBS signature in the COSMIC database (https://cancer.sanger.ac.uk/signatures/) (e.g., used_sign = ["SBS1","SBS4","SBS16"] ), the list of change points (e.g., one change-point at time 50 should be specified vector_change_points = [0.0, 50.0]), the values of the activations for each signature in each of the time span defined by the change-points (e.g., vector_activities = [[0.7,0.2,0.1], [0.0,0.3,0.7]]) ) and the ratio of mutation due to the background uniform process or due to the mutational signatures (e.g., ratio_background_signature = 0.8). Note that using finite-site for long genomes come at the cost of computational performance. After this computation, the sequences of the samples (i.e., the leafs of the phylogenetic tree) are returned as FASTA file in the folder /"path_to_save_files"/Fasta output.

Sequencing experiment

To simulate the reads of a sequencing experiment J-SPACE calls ART (https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm). The user can use a configuration file to specify the error model (for Illumina platforms), the number of reads, the length of the reads, and if the experiment uses single-end paired-end/mate-pair reads. In addition, in the configuration file, there is the option to insert custom "calls" for ART with the possibility to use it in any possible configuration. If the user simulate the experiment, J-SPACE returns for each cell, the simulated reads as FASTQ file, the alignment map of the reads over the genome of the sampled cells in SAM and/or ALN format. If the infinite-site model is used, it is possible to obtain the VCF file directly without simulating the reads with ART.

REQUIRED SOFTWARE AND PACKAGE

INSTALLATION OF J-SPACE

J-SPACE can be downloaded from Github. First, it is necessary to install the Julia from https://julialang.org/.
Next, the user need to copy the project folder in the chosen working directory. To install J-SPACE follow the steps:

  1. Using REPL or the COMMAND LINE move to the working directory.
  2. If you use the COMMAND LINE, to start a Julia session run the command:

julia

  1. To enter in the Pkg REPL type

]

  1. Type the command

    activate .

  2. To activate the J-SPACE project, type

    instantiate

RUN J-SPACE

RUN A SINGLE SIMULATION

The parameters and the configuration of the simulation are managment by the user by modifing the files "Parameters.toml" and "Config.toml" (the name of the file is not mandatory), that are detailed in the next sections.
To run a simulation of J-SPACE using the ".toml" file for the paramet follow the following step:

  1. Load the J-SPACE package using:

    using J_Space

  2. Start the simulation

    Start_J_Space("Parameters.toml","Config.toml")

NOTE: the simulation does not start if in the working folder are absent the two .toml files.

RUN THE EXAMPLES

To run the examples, in the main folder of J-SPACE, from command line digit

julia --project=. ./Experiments/Experiment_2D/experiment_2D.jl

or

julia --project=. ./Experiments/Experiment_3D/experiment_3D.jl

Run the variant calling pipeline

Necessary package

Change field name and prefix into environment_j_space.yml

Open the conda environment

conda env create -f environment_j_space.yml --prefix "path to the enviroment directory"

Activate the conda environment

conda activate "path to the enviroment directory"

Then register to gatk (Not necessary in the same working folder)

gatk3-register "path to the gatk directory"

Move into working folder where you have j_space_pipeline.sh Run pipeline

./j_space_pipeline.sh "path/to/reference/" "path/to/FastaQ" "path/working/directory"

NOTE: the paths are absolute paths

OUTPUTS OF J-SPACE

J-SPACE provides the following outputs.

THE CONFIGURATION FILE OF J-SPACE

In the file "Config.toml" the user can manage the configuration of J-SPACE. This file is useful to choose the path where save the files, the desired plots and output files. We provide an example in the main folder of J-SPACE.

The following are the paramenters of such file:

If Tree_Driver_Configure = 1

THE PARAMETERS FILE OF J-SPACE

In the file "Parameters.toml" the user will find all the paramenters of the dynamics, molecular evolution and experiment.

Parameters of the generation of the lattice

Parameters of the clonal spatial dynamics

If Random_sampling = 0

Parameters of the molecular evolution

if type_isa = 1

if type_isa = 0

Parameters of the sequencing experiment (ART)

if paired_end = 1, are required the following

For all paramenters of ART please see: https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm

LICENSE

See the file COPYING for license information.

CONTACTS

Please feel free to contact us if you have problems running our tool at fabrizio.angaroni@unimib.it and a.guidi@campus.unimib.it .