nicola-calonaci commented 2 months ago

Driver Analysis is a subworkflow that does:

[ ] Global selection analysis (dndsCV)
[ ] Identification of genes driving positive/negative selection (dndsCV)
[ ] Identification of mutations with high immunogenicity (SOPRANO)

save the output of the tools to files and flags mutations in the mutation table accordingly. Possible flags:

known driver (from a user defined list of drivers e.g. IntoGen) (boolean)
under_positive_selection (from dndsCV with user-defined significancy threshold) (boolean)
under_negative_selection (from dndsCV with user-defined significancy threshold) (boolean)
immunogenic (boolean)
quantities (from the various table) (integer/float/string)

tucano commented 1 month ago

STATUS 5 Ago 2024

Adding some info on the current status:

workflow DRIVER_ANNOTATION contains 2 process:

BUILD_REFERENCE Input:tuple val(meta), path(cds), path(genome) Output: tuple val(meta), path("reference.rda"), emit: dnds_reference
DNDSCV Input: tuple val(meta), path(snv_rds), path(driver_list), path(reference) Output: tuple val(meta), path("*_dnds.rds"), emit: dnds_rds

Both process are built with inline Rscript and tested for success and matching snapshot.

TESTING

nf-test test tests/modules/local/build_reference/main.nf.test
nf-test test tests/modules/local/dndscv/main.nf.test

TODO

DRIVER_ANNOTATION workflow and test
Params and interface for driver_list, cds and genome. Suggestion needed
Container, actually i am testing locally with --profile test should work on your HPC I am using the cdslab.sif

Reference commit: b3319e302a40b9cdb57b8d739737c9b77d0cf079

tucano commented 1 month ago

STATUS 6 Ago 2024

DRIVER_ANNOTATION subworkflow with basic testing (using prebuilt RefCDS)
Container for dndscv: https://github.com/tucano/dndscv_docker

With the docker container and double call we can run tests both on HPC and OSX laptops:

container "${workflow.containerEngine == 'singularity' ? 'docker://tucano/dndscv:latest' : 'tucano/dndscv:latest'}"

My current interface for DNDSCV subworkflow:

BUILD_REFERENCE in workflow when I pass cda and genome
Skip BUILD_REFERENCE using a prebuilt Custom RefCDS passed as rda when dndscv_refcds_rda is not null

TODO

Add globaldnds to the rds object
Params and interface
More testing
Minidataset for integration testing (BUILD_REFERENCE and then DNDSCV)

tucano commented 1 month ago

STATUS 7 AGO 2024

Added DRIVER_ANNOTATION with basic nf-tests Added params for significance limits to mark mutations as potentially driver Created a docker container for dndscv: https://hub.docker.com/r/tucano/dndscv

tucano commented 1 week ago

dndscv updates, test with SCOUT_SPN01 files.

using hg19_hg38 covariates (covariates_hg19_hg38_epigenome_pcawg.rda)

The pileup_VCF SCOUT_SPN01_SS_SPN01_Sample_1_pileup_VCF.rds still failing with

11 (18%) mutations have a wrong reference base. Please confirm that you are not running data from a different assembly or species.

This limit (>10%) is hardcoded in dndscv.

Changes TODO in the dndscv nextflow process:

param to set covariates reference file
support for multiple samples in the RDS input file (new file have normal and tumor)

Proposal on how to use dndscv to annotate drivers

We run dndscv on ALL samples in a single step (may be divided by timepoint/categories) from this we infer the DRIVER mutations and genes with a good statistical power. We then use this COHORT dnds to estimate:

The global dnds of the cohort
The potential driver genes in the cohort

Example:

we run dndscv on all samples and we get positive selection with gene MSH6 as a potential driver with 6 non-synonymous mutations and 2 synonymous mutations (in the whole cohort). Now for each mutation in each sample, we add the "potential driver annotation" column using the global dnds information and stats.

caravagnalab / tumourevo

Driver Analysis #13

STATUS 5 Ago 2024

TESTING

TODO

STATUS 6 Ago 2024

TODO

STATUS 7 AGO 2024

dndscv updates, test with SCOUT_SPN01 files.