Open JocelynSP opened 3 years ago
Hi Jocelyn,
Thanks for reaching out.
Astute observation. The general goal of the filtering is to remove lowly expressed genes that are not well detected in the single-cell data for technical reasons so that we can focus on gene with lower expression due to CNV losses for example. Initially, when we developed HoneyBADGER, we used it on SmartSeq2 single cell RNA-seq data and the default filters were set with SmartSeq2 data in mind. When we expanded to 10X data as well, we decided to use less stringent filters by default and updated the default parameters of setGexpMats
but forgot to update hb$setGexpMats
. You can look into the default parameters for both functions to see the difference.
## for setGexpMats
minMeanBoth=0, minMeanTest=mean(gexp.sc.init[gexp.sc.init!=0]), minMeanRef=mean(gexp.ref.init[gexp.ref.init!=0]),
## for hb$setGexpMats
minMeanBoth=4.5, minMeanTest=6, minMeanRef=8,
This does need to be fixed. If you would be so kind as to help update hb$setGexpMats
to have the default parameters as setGexpMats
and make a pull request, I'd be happy to accept your pull so you can get credit for your contribution.
The example data has already been filtered and normalized relative to the reference data. Note that after normalization to the reference data, we cannot use the same filtering criteria.
Hope that helps, Jean
Hi, I know that this is an old thread, but I'm getting a similar issue, so I though I'll continue it here instead of creating duplicate threads. I'm also working with 10x data, and have turned off filtering like this:
hb$setGexpMats(cancer_matrix, ref_matrix, mart_obj,
filter=FALSE, scale=FALSE, verbose=TRUE)
Still, I'm getting an error like this:
ERROR NO GENES AFFECTED BY CNVS IDENTIFIED! Run calcGexpCnvBoundaries()? Error in rowMeans(amp.gexp.prob) :
'x' must be an array of at least two dimensions
Calls: <Anonymous> -> cbind -> cbind -> data.frame -> rowMeans
Preceeding the error, I get deletion probability of 0 for every cnv, and amplification probabilities between 0.5 and 0.6. When probabilities per barcode are created, I see the same picture with probabilities between 0.5 and 0.6. I believe that the issue is happening because all significantly differing datapoints are filtered out previously, but can't pinpoint it, since, again, the filtering option of setGetxpMaps method is turned off. Was this issue fixed previously, or is it still open? Thank you in advance.
Best regards, Dmytro
Dear Dmytro,
Thanks for reaching out.
Does running setGexpMats
provide you with any other message when verbose=TRUE
?
Without seeing a snippet or knowing the dimensions of your cancer_matrix
and ref_matrix
, my best guess is that the gene names in these matrixes do not match up with gene names in the mart.obj
annotations.
If calcGexpCnvBoundaries
is returning an error that there are too few genes, you may want to check the dimensions of the gexp.norm
and genes
slot in your hb
object to see if it's an issue with there being too few genes at this stage or perhaps just certain chromosomes having no genes. By default, all chromosomes are tested via the parameter chrs=paste0('chr', c(1:22))
though we have seen instances with smaller chromosomes like chromosome 21 having very few genes detected when we do shallow sequencing for example.
Hope this helps, Jean
Hi, I have just installed HoneyBADGER, planning to use it for sc 10X analysis. I am seeing radically different filtering of genes between
gexp.mats <- setGexpMats(...)
andhb$setGexpMats(...)
For example with my own data:Or with the example data (which doesn't need filtering, but I used for a comparison test):
Can you tell me what is wrong with the internal version? Thanks!