catg-umag / bcell-lymphomas-mutational-signatures

B-Cell Lymphomas Mutational Signatures
MIT License
1 stars 0 forks source link
b-cell lymphoma mutational-signatures reproducible-research

Mutational Signatures in B-cell lymphomas

Software repository for our article Integration of mutational signature analysis with 3D chromatin data unveils differential AID-related mutagenesis in indolent lymphomas, for reproducibility purposes.

But if you want, you can use you own data too, everything is automated so it will be easy to run if you want a general landscape of mutational signatures in your samples.

What is included?

How to use it?

Requirements

First, you need to have installed Nextflow (>=20.07) and Singularity.

Preparation of inputs

You have two options: starting from the VCFs or starting from a list of variants.

Run the pipeline

To run run the pipeline, execute:

nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main <params>

In <params>, you need to provide inputs and other options. These are:

Parameter Required Default Description
--vcf_list yes* Input CSV if you want to start with the VCFs (according to previous section). Ignored if --snv_list is not empty.
--snv_list yes* Input CSV if you want to start with the list of variants (according to previous section).
--reference yes Reference in 2bit format. Must be the same used in the variant calling. For example: hg19 or hg38
--ig_list yes Bed file containing the ranges for the Ig loci. Check data/iglist_hg38.bed for a example.
--nsignatures_min no 2 Minimum number of signatures to test with sigprofiler.
--nsignatures_max no 5 Maximum number of signatures to test with sigprofiler.
--nsignatures_force no Ignore the recomendation from SigProfiler regarding the optimal number of signatures, and use a fixed number of signatures as final output. Must be a number between nsignatures_min and nsignatures_max values (both inclusive).
--cosmic_version no 3.2 Version of COSMIC signatures to use. Check data/cosmic_signatures_urls.csv for possible options.
--cosmic_genome no GRCh38 COSMIC signatures genome. Check data/cosmic_signatures_urls.csv for possible options.
--fitting_selected_signatures no Select only a set of reference signatures for the fitting. The value should be a string containing valid signature names from the COSMIC version selected, separated by commas. Example: "SBS1,SBS3,SBS5,SBS6,SBS9,SBS84"
--fitting_extra_signatures no Provide additional (local) signatures for the fitting. Must be a CSV file, check data/extra_signatures.csv for the format.
--results_dir no results Output directory to store the results.
--sigprofiler_cpus no 8 Number of CPUs to use with SigProfiler.
--sigprofiler_gpu no False Use a GPU in SigProfiler. It must be a supported CUDA device.

So, for example, a full execution command should look like this:

nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main \
  --snv_list data/snv_list.csv --reference data/hg38.2bit --ig_list data/iglist_hg38.bed \
  --nsignatures_min 2 --nsignatures_max 10 --fitting_selected_signatures 'SBS1,SBS3,SBS5,SBS6,SBS9,SBS84'

Alternatively, you can provide a yaml file containing all the parameters you want to setup (that way you don't have to write everything on the command line). Just download params.example.yml and edit it to your needs (you can delete parameters from the file if you don't want to use them). Then execute the pipeline like this:

nextflow run CATG-UMAG/bcell-lymphomas-mutational-signatures -r main --params-file params.yml

You can also use any option available in Nextflow.

It's also very easy to run on a computing cluster (as long as Singularity is available). I included a profile for SLURM (-profile slurm), if your cluster uses a different scheduler, you should look here to find the corresponding configuration.

Results

Once the pipeline finished running you will find a set of files. These are:

How to cite

If this repository was useful for you, please cite it as below:

Sepulveda-Yanez JH, Alvarez-Saravia D, Fernandez-Goycoolea J, Aldridge J, van Bergen CAM, Posthuma W, Uribe-Paredes R, Veelken H, Navarrete MA. Integration of Mutational Signature Analysis with 3D Chromatin Data Unveils Differential AID-Related Mutagenesis in Indolent Lymphomas. International Journal of Molecular Sciences. 2021; 22(23):13015. https://doi.org/10.3390/ijms222313015

Acknowledgements

In containers/ you can find the recipes used to build the containers for the pipeline (hosted in GitHub Container Registry). These are the ones configured in nextflow.config, alongside others from BioContainers.