Closed LukasHats closed 1 month ago
Hi @LukasHats, good question! Based on some quick tests done on scRNA-seq, PCA was working but more complex and informative embeddings were improving the results quite a bit.
The original trVAE repository is deprecated but not it's implementation inside scArches, and we are using that one as you can see here.
Scaling is important when you use PCA or trVAE to make sure that every feature is weighted equally. I would not suggest scaling per sample anymore, so of you scale, do it across all samples.
Remember that if you don't have batch effects and you don't have 100s and 100s of markers, you could also try running CellCharter without dimensionality reduction!
Hez @marcovarrone,
thanks a lot for the quick answer and the insights, much appreciated! 3/4 follow-up questions:
1) Would scaling be used if no dimensionality reduction is performed?
2) If batch effects are present, we address this only by the dimensionality reduction right?
3) Would you share your comment on this approach: For IMC images, people often use z-score normalized marker expressions to remove batch effects etc. Would it be an idea to put in Z-score values in CellCharter without dimensionality reduction?
4) If I understood you correctly, scale across all samples means just running sc.pp.scale
on the complete adata as classically performed in scanpy?
Thanks a lot for help, excited to run it soon!
Hi @LukasHats,
Let me know what you think about it :)
Thanks so much, also for the excursus on batch effect removal. Will close for now and open if I encounter problems.
Description of feature
Dear @marcovarrone,
Thanks for providing CellCharter! Concerning the CODEX tutorial, could you give a suggestion on what dimensionality reduction method should be used, is a standard
sc.pp.pca
enough or is trVAE recommended (which seems to be a bit deprecated at least the repository points towardscarches? Further, why do you scale per image? Is this necessary or generally necessary for marker-based neighborhood detection? What would happen if raw values or e.g. z-score normalized values are used?Thanks!