Nik-Zainal-Group / signature.tools.lib

R package containing useful functions for mutational signature analysis
Other
80 stars 26 forks source link

bedpeToRearrCatalogue clarification (starting from GRIDSS somatic vcf) #31

Closed john-alexander closed 2 years ago

john-alexander commented 3 years ago

I've used GRIDSS to call genomic rearrangements (somatic.vcf) for 3 of our Triple negative WGS samples

The recommended method for generating a bedpe from GRIDSS is to use StructuralVariantAnnotation (breakpointgr2bedpe)

library(StructuralVariantAnnotation) library(rtracklayer) vcf = readVcf("gridss.vcf") # Export breakpoints to BEDPE bpgr = breakpointRanges(vcf) write.table(breakpointgr2bedpe(bpgr), file="gridss_breakpoints.bedpe", sep="\t", quote=FALSE, col.names=FALSE)

A BEDPE exported from StructuralVariantAnnotation has +/- for deletion and -/+ for tandem duplication although bedpeToRearrCatalogue.R from signature.tools.lib requires deletion (+/+) and tandem-duplication (-/-).

Q: How do we make this bedpe file compatible with signature.tools.lib bedpeToRearrCatalogue() .

Ultimately, I'm trying to compare these signatures to Breast560_rearrangement.signatures and currently we don't seem to see the expected RS TNBC signatures in our data (possibly owing to how bedpe is constructed). Please note that the rearrangement signatures inside package Breast560_rearrangement.signatures (eg RS1) is different from table in BRCA560 (Supplementary.Table.21.Signatures.txt).

Below I outline the steps I've used to compare our sample RS catalgoue with existing Breast560 RS signatures:

reslist <- bedpeToRearrCatalogue(bedpe); res <- reslist$rearr_catalogue; ` # fit signatures with BRCA560 paper res.fit <- SignatureFit( sample.mat, # matrix of sample rearrangement catalogues rs.sig, # taken from Supplementary.Table.21.Signatures.txt method = "KLD", bf_method = "CosSim", alpha = -1, doRound = TRUE, verbose = TRUE, n_sa_iter = 500 );`

Also tried:

res.fit.bootstrap <- SignatureFit_withBootstrap_Analysis( sig.fit.dir, sample.mat, rs.sig, nboot = 100, type_of_mutations = "rearr", threshold_percent = 5, threshold_p.value = 0.05, method = "KLD", bf_method = "CosSim", alpha = -1, doRound = TRUE, nparallel = 1, n_sa_iter = 500 );

Also of note, I tried to utilise the SingatuerExtraction function on our 3 samples which resulted in an error below if using nsig > 3

SignatureExtraction( cat = sample.mat, outFilePath = sig.ext.dir, nrepeats = 200, nboots = 20, nparallel = 8, nsig = c(2:3), mut_thr = 0, type_of_extraction = "rearr", project = "KCL_PDX_DB", plotCatalogue = TRUE, parallel = TRUE, nmfmethod = "brunet" );

_shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory` Error in SignatureExtraction(cat = sample.mat, outFilePath = out.dir, :object 'clusteringlist' not found

andreadega commented 3 years ago

Hi John,

Thanks for using our tools.

As you may have seen in the documentation, in order to make your BEDPE compatible with our bedpeToRearrCatalogue function, you need the following columns: "chrom1", "start1", "end1", "chrom2", "start2", "end2" and "sample", and then you will need one of the following:

  1. "strand1" (+ or -) and "strand2" (+ or -) columns, where, as you have noticed, you will need to invert "strand2"
  2. alternatively, you don't need "strand1" and "strand2", you can just supply an "svclass" column, with the values as "translocation", "inversion", "deletion" or "tandem-duplication". In fact, bedpeToRearrCatalogue will search for "svclass" and if it doesn't find it, then it will try to use "strand1" and "strand2" to generate the "svclass" column.

I am really not sure why there are very minor differences between the Breast560 signatures we provide in our package and those that are in Suppl table 21D. I can only guess it is an approximation error. Feel free to use the one you prefer, they are basically identical.

Given that you are working on only 3 samples, signature extraction is probably not viable in your case. Naturally, the NMF decomposition cannot produce more signatures than there are samples. The procedure should decompose the samples into a sum of recurrent patterns, but you cannot have more patterns than samples.

So going for signature fit seems to be the best choice. You can also use our rearrangement signatures from our Nature Cancer 2020 paper. These are those for breast cancer: https://signal.mutationalsignatures.com/explore/studyTissueType/1-4

Best Wishes, Andrea

john-alexander commented 3 years ago

Appologies for the late response Andrea. That fixed the issue. thank you!