Which `selection.method` to use for FindingVariableFeatures on ALRA imputed data

Rohit-Satyam commented 1 year ago

Hi @linqiaozhi @JunZhao1990 @rcannood @inoue0426

I was following this issue where @ChristophH mentions that

Results should not be very different from using the original "count" data. Generally, using "data" slot should work with "vst" method as long as the loess fit can capture the mean

variance relationship.

Also, @linqiaozhi suggests

For example, "The VST selection method uses count data and does not use the ALRA imputed data; please use mean.var.plot instead, if you would like to find the variable genes based on the imputed data."

So I decided to see if this relationship of mean-variance could be captured better by vst or mean.var.plot method of Seurat. Unlike mca (Malaria Cell Atlas) that I wish to use as reference and didn't perform imputation on, some cells in my samples (t1,n1) shows some deviation from the linear relationship. Is this slight deviation anticipated ?

I also observe that the standardized variance for imputed data is based at 1 unlike MCA which is based at zero. So will this be a problem when I perform integration with MCA of these samples? I am trying to resolve the problem of Jackstraw plot having all PCs as significant that I discuss in another issue here and I thought maybe the nature of imputed data or the method used for feature selection might be influencing this.

mvp_deidentified

Rohit-Satyam commented 1 year ago

Hi @ChristophH. Do you have thoughts on this?

ChristophH commented 1 year ago

I'd stick to the raw counts (not imputed) and use the vst method. If there is no mean-variance relationship, the data is violating some basic assumptions the method is based on, so proceed with care.

KlugerLab / ALRA

Which `selection.method` to use for FindingVariableFeatures on ALRA imputed data #25