Thank you for developing this very useful tool and the clear descriptions in the paper. We are looking forward to using it ourselves but I have some questions concerning the use of CellBender on samples (sc/sn-RNA-seq & CITE-seq) in the context of a multi-sample cohort setting.
According to Supplementary Section 2.3 of the paper, it is recommended to set nFPR = 0
"In a cohort setting, it is important to set nFPR = 0 to avoid over-correction beyond the expected noise budget. Using larger values of nFPR naturally imparts a bias on the output by preferentially keeping only the most certain cell counts, which is unsuitable when aggregating data from many samples."
General questions:
1) Do you have experience yourself with setting nFPR to 0 compared to the recommended default for non-cohort datasets?
2) Will this not retain too much background noise, and thus not entirely solve the issue of spurious DE results due to differences in background noise?
3) Would it be really problematic to use the same but low nFPR value (eg 0.01) for all your samples in a cohort if nFPR would not provide sufficient denoising?
Questions concerning CITE-seq data:
4) What are your recommendations for CITE-seq samples as part of a multi-sample cohort setting? We were wondering about this since the recommended nFPR value for CITE-seq data is quite high, and thus the "opposite" recommendation than for cohort studies. Our CITE-seq data typically provides ADT data for > 100 antibody features.
5) How strongly are the cell calling and denoising of the RNA counts affected by whether CellBender is run on the RNA assay alone or on both RNA+ADT? Will the corrected RNA count matrix be the same?
Dear @sjfleming,
Thank you for developing this very useful tool and the clear descriptions in the paper. We are looking forward to using it ourselves but I have some questions concerning the use of CellBender on samples (sc/sn-RNA-seq & CITE-seq) in the context of a multi-sample cohort setting.
According to Supplementary Section 2.3 of the paper, it is recommended to set nFPR = 0
"In a cohort setting, it is important to set nFPR = 0 to avoid over-correction beyond the expected noise budget. Using larger values of nFPR naturally imparts a bias on the output by preferentially keeping only the most certain cell counts, which is unsuitable when aggregating data from many samples."
General questions:
Questions concerning CITE-seq data:
Thanks a lot!