why use aligned embeddings for downstream analyses?

Hi, thank you for your responsiveness as I continue to learn more about the PRECAST package. My question is generally about downstream analyses after running PRECAST. For example, I want to run the conditional SVG analysis.

In the manuscript, you use the aligned embeddings as covariates as input to the SPARK package. (From my understanding, ‘aligned embeddings’ refers to the PCs obtained from house keeping gene expression counts across all samples; in the codebase, it is encoded as “hz.” Please let me know if I am incorrect.)

(1) Why use the aligned embeddings as covariates? Why not use the precast corrected gene expression data? Using the aligned embedding as covariates with the raw gene counts as input seems like double dipping from the data - because the embeddings are essentially a low dimensional representation of the raw gene count data - correct? I may be mistaken if the embeddings are representing something else here. Or if the embeddings are only house keeping genes which are not used when PRECAST model is run.
(2) Within the PRECAST object, these embeddings can be found in the “reductions” slot under “PRECAST$cell.embeddings”? Is this the correct location?

For your convenience, here are references from the code base and the manuscript that I was referring to:

Here is the code for running the conditional SVG analysis from the manuscript. (PRECAST_Analysis/blob/main/Real_data_analysis/dorsolateral_prefrontal_cortex.R)
spark_brain <- spark.vc(spark_brain, covariates = hZ, lib_size = spark_brain@lib_size,
                          num_core = num_core,  fit.model = 'gaussian', verbose = verbose)
And in the manuscript, “hZ” is first introduced in the Methods section under “Recovery of comparable gene expression matrix.

feiyoung / PRECAST

why use aligned embeddings for downstream analyses? #27