Closed tobsecret closed 5 years ago
No sorry this won't work, the algorithm assumes the mutation counts inputs are whole numbers, the opportunity matrix is used as weights in the decomposition of signatures x expositions.
However If I understood this correctly, you can use the opportunity matrix for this, each row of the opportunity matrix will be used for the respective sample on the same row of the count matrix, so you can have samples with different opportunities as long as each row is correctly set for each sample.
on the vignette, the genOpportunityFromGenome
generates a full matrix for a single genome, but you generate a opp matrix for each species you have (set a specific number of samples on nsamples
), and concatenate the rows of all opp matrices.
Aaaaah, yes that should work! Awesome, I'll try that, thanks a ton!
So let's say I have two different vcf files, with species 1 and 2, in the same naming scheme as the vignette, the following should work?
# first genome and vcf
target_regions1 <- import(con="/path/to/a/target1.bed", format="bed")
mygenome1 <- FaFile("/path/to/genome1.fasta")
vcfobj1 <- readVcf("/path/to/a/species1.vcf", mygenome1)
mut1 <- genCountMatrixFromVcf(mygenome1, vcfobj1)
opp1 <- genOpportunityFromGenome(mygenome1, target_regions1, nsamples=nrow(mut1))
# second genome and vcf
target_regions2 <- import(con="/path/to/a/target2.bed", format="bed")
mygenome2 <- FaFile("/path/to/genome2.fasta")
vcfobj2 <- readVcf("/path/to/a/species2.vcf", mygenome2)
mut2 <- genCountMatrixFromVcf(mygenome2, vcfobj2)
opp2 <- genOpportunityFromGenome(mygenome2, target_regions2, nsamples=nrow(mut2))
# combine muts and opps
mut <- rbind(mut1, mut2)
opp <- rbind(opp1, opp2)
yes that looks right !
This works, at least in my test on a subset of my data! Thanks again for the support!
I am asking this because I want to combine data from two closely related species with wildly different AT contents, in order to figure out if there are any shared mutational signatures between them. Is it functionally the same to input mut and opp into signeR vs inputting only mut, but dividing the rows by opp?
This makes me believe it is but I am not sure I am understanding this correctly.