Genentech / sVAE

Apache License 2.0
55 stars 5 forks source link

Norman Dataset split #1

Open fkapl opened 1 year ago

fkapl commented 1 year ago

In section 5.1 of your paper you explain how to preprocess and split the Norman dataset. My main question regarding this is: Can you further elaborate which interventions (top 30) you selcected with the most significant effect on gene expression for the held-out set? For reproducibility and comparing with your results it would be very helpful to either provide these as a fixed list (of names) or a code snippet to reproduce the same split.

Any help on this topic would be very much appreciated and let me just say I enjoyed reading your paper very much as well!

romain-lopez commented 1 year ago

Yes! I have used the energy distance function from this package

https://dcor.readthedocs.io/en/latest/functions/dcor.energy_distance.html

This must be calculated in some low-dimensional space, so for example scVI or PCA, between the control cells, and the cells from each guide. The energy distance can also be substituted by the MMD with a linear kernel. Both of these metrics quantify changes between distribution, a useful quantity for assessing impact of those perturbations! Thanks!