BRCAness (SBS3 signature following the COSMIC reference set up by Alexandrov et al.) can be used as a proxy for HRD and is requested as clinical marker in the DNPM KDS v2. The BRCAness will be given as a float between 0-1, indicating the relative contribution of the SBS3 signature to a sample. My proposal for calculating the BRCAness is as follows:
Prior to VCF conversion, low quality mutations and mutations not flagged as PASS should be filtered out of the VCF to reduce false assignment to "flat" signatures SBS5 or SBS8 and technical artifact signatures
Using the COSMIC software suite of SigProfiler (https://osf.io/t6j7u/wiki/home/), input VCF data will be converted to a single trinucelotide context matrice as input for signature assignment. Based on the matrice, assignment of Signatures to the COSMIC dataset occurs using SigProfilerAssignment. This would be a rather straightforward approach independent of input data size.
An alternative approach includes de novo Signature assignment using SigProfilerExtractor, followed by decomposition and assignment of these Signatures to the COSMIC reference set in a two-step decomposition approach.
Two major caveats of the SigProfiler suite are the dependency on multiple samples for NMF and no existing containers of the suite in public docker/singularity sites.
As requested in the KDS v2, a confidence interval should be generated for the assigned SBS3 values. Different python packages as e.g. Scipy.Stats are able to generate CI intervals based on sample size, sample mean and the sample standard deviation.
I propose to calculate these based on the respective VCF input dataset, as trinucleotide context assignments are merged for NMF either way, although this would mean that the CI would depend on the input dataset size and composition and would only be reproducible if the same dataset is used as input.
Some literature references I used:
While reading on calculating BRCAness, I've found the following publication and the linked GitHub-repository which use the SigProfiler suite for Signature assignment.
Description of feature
BRCAness (SBS3 signature following the COSMIC reference set up by Alexandrov et al.) can be used as a proxy for HRD and is requested as clinical marker in the DNPM KDS v2. The BRCAness will be given as a float between 0-1, indicating the relative contribution of the SBS3 signature to a sample. My proposal for calculating the BRCAness is as follows:
Two major caveats of the SigProfiler suite are the dependency on multiple samples for NMF and no existing containers of the suite in public docker/singularity sites.
As requested in the KDS v2, a confidence interval should be generated for the assigned SBS3 values. Different python packages as e.g. Scipy.Stats are able to generate CI intervals based on sample size, sample mean and the sample standard deviation. I propose to calculate these based on the respective VCF input dataset, as trinucleotide context assignments are merged for NMF either way, although this would mean that the CI would depend on the input dataset size and composition and would only be reproducible if the same dataset is used as input.
Some literature references I used: