caravagnalab / tumourevo

Analysis pipleine to model tumour clonal evolution from WGS data (driver annotation, quality control of copy number calls, subclonal and mutational signature deconvolution)
GNU General Public License v3.0
1 stars 1 forks source link

Driver Analysis #13

Open nicola-calonaci opened 2 months ago

nicola-calonaci commented 2 months ago

Driver Analysis is a subworkflow that does:

save the output of the tools to files and flags mutations in the mutation table accordingly. Possible flags:

tucano commented 1 month ago

STATUS 5 Ago 2024

Adding some info on the current status:

workflow DRIVER_ANNOTATION contains 2 process:

  1. BUILD_REFERENCE Input:tuple val(meta), path(cds), path(genome) Output: tuple val(meta), path("reference.rda"), emit: dnds_reference

  2. DNDSCV Input: tuple val(meta), path(snv_rds), path(driver_list), path(reference) Output: tuple val(meta), path("*_dnds.rds"), emit: dnds_rds

Both process are built with inline Rscript and tested for success and matching snapshot.

TESTING

nf-test test tests/modules/local/build_reference/main.nf.test
nf-test test tests/modules/local/dndscv/main.nf.test

TODO

  1. DRIVER_ANNOTATION workflow and test
  2. Params and interface for driver_list, cds and genome. Suggestion needed
  3. Container, actually i am testing locally with --profile test should work on your HPC I am using the cdslab.sif

Reference commit: b3319e302a40b9cdb57b8d739737c9b77d0cf079

tucano commented 1 month ago

STATUS 6 Ago 2024

  1. DRIVER_ANNOTATION subworkflow with basic testing (using prebuilt RefCDS)
  2. Container for dndscv: https://github.com/tucano/dndscv_docker

With the docker container and double call we can run tests both on HPC and OSX laptops:

container "${workflow.containerEngine == 'singularity' ? 'docker://tucano/dndscv:latest' : 'tucano/dndscv:latest'}"

My current interface for DNDSCV subworkflow:

  1. BUILD_REFERENCE in workflow when I pass cda and genome
  2. Skip BUILD_REFERENCE using a prebuilt Custom RefCDS passed as rda when dndscv_refcds_rda is not null

TODO

  1. Add globaldnds to the rds object
  2. Params and interface
  3. More testing
  4. Minidataset for integration testing (BUILD_REFERENCE and then DNDSCV)
tucano commented 1 month ago

STATUS 7 AGO 2024

Added DRIVER_ANNOTATION with basic nf-tests Added params for significance limits to mark mutations as potentially driver Created a docker container for dndscv: https://hub.docker.com/r/tucano/dndscv

tucano commented 1 week ago

dndscv updates, test with SCOUT_SPN01 files.

using hg19_hg38 covariates (covariates_hg19_hg38_epigenome_pcawg.rda)

The pileup_VCF SCOUT_SPN01_SS_SPN01_Sample_1_pileup_VCF.rds still failing with

11 (18%) mutations have a wrong reference base. Please confirm that you are not running data from a different assembly or species.

This limit (>10%) is hardcoded in dndscv.

Changes TODO in the dndscv nextflow process:

  1. param to set covariates reference file
  2. support for multiple samples in the RDS input file (new file have normal and tumor)

Proposal on how to use dndscv to annotate drivers

We run dndscv on ALL samples in a single step (may be divided by timepoint/categories) from this we infer the DRIVER mutations and genes with a good statistical power. We then use this COHORT dnds to estimate:

Example:

we run dndscv on all samples and we get positive selection with gene MSH6 as a potential driver with 6 non-synonymous mutations and 2 synonymous mutations (in the whole cohort). Now for each mutation in each sample, we add the "potential driver annotation" column using the global dnds information and stats.