KrishnaswamyLab / MAGIC

MAGIC (Markov Affinity-based Graph Imputation of Cells), is a method for imputing missing values restoring structure of large biological datasets.
GNU General Public License v2.0
334 stars 97 forks source link

How to reproduce Figure 7A of original paper to estimate optimal values for t and ka #70

Closed fbrundu closed 6 years ago

fbrundu commented 6 years ago

I am trying to understand how to set parameters for t and ka since I noticed that changing them in my case severely affects how the values are imputed. Does MAGIC have a method to reproduce a plot similar to the the Figure 7A?

Figure 7: Finding the optimal diffusion time (t) using intrinsic dimensionality estimation. A) Graph shows intrinsic dimensionality (as measured by correlation dimension) computed on EMT data for different amounts of diffusion time (t) for three values of adaptive kernel (𝑘𝑎 = 4, 10, 30). The peak values suggest optimal diffusion times that restore maximal dimensionality (information) to the data.

Thanks, Francesco

dvdijk commented 6 years ago

Hi Francesco,

We're about to publish a method for selecting t on github. This should happen this month. As for ka, you want to set it as small as possible. Start with ka=3.

David

On Fri, Jan 5, 2018 at 2:58 PM Francesco G. Brundu notifications@github.com wrote:

I am trying to understand how to set parameters for t and ka since I noticed that changing them in my case severely affects how the values are imputed. Does MAGIC have a method to reproduce a plot similar to the the Figure 7A

Figure 7: Finding the optimal diffusion time (t) using intrinsic dimensionality estimation. A) Graph shows intrinsic dimensionality (as measured by correlation dimension) computed on EMT data for different amounts of diffusion time (t) for three values of adaptive kernel (𝑘𝑎 = 4, 10, 30). The peak values suggest optimal diffusion times that restore maximal dimensionality (information) to the data.

Thanks, Francesco

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pkathail/magic/issues/70, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs3p9t28JPmEESOGepwYqHPBv-8Rfks5tHn75gaJpZM4RU4Si .

dvdijk commented 6 years ago

Yeah, we have developed a new method for finding the optimal t. See: compute_optimal_t.m for Matlab. We also have it for R and Python. We're in the process of adding optimal t calculation to the tutorials, though the functions should be pretty straightforward to use. Let us know if you have problems running them

On Fri, Feb 16, 2018 at 3:49 PM adddddn notifications@github.com wrote:

David, do you have any updates on when this will be published? In the meantime, do you have any advice for choosing parameters?

For example, if I vary 't' and 'ka', then use an R package to calculate correlation dimension (e.g. fractal), would this be sufficient to produce Fig 7?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/magic/issues/70#issuecomment-366354317, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs3kNe6o-r1wfEy_KvkQSh1yiXKPcks5tVenYgaJpZM4RU4Si .

dvdijk commented 6 years ago

I wouldn't use correlation dimension at all. I would start with ka=4 and if that doesn't work well try bigger up to 10, no need for bigger I think. Optimal t will give you the t

On Sat, Feb 17, 2018 at 6:07 PM adddddn notifications@github.com wrote:

Thanks! To tune the parameters, I am planning on running magic with varying ka (between 4 and 30) and the optimal t calculation. I was then going to plot the correlation dimension against ka. Is this what you would recommend?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/magic/issues/70#issuecomment-366478195, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs3IWbl-xzBzWjKW9ggsVVlIzVe5Eks5tV1u7gaJpZM4RU4Si .

dvdijk commented 6 years ago

I would not recommend running tSNE after MAGIC (or in general) since tSNE destroys continuous structure in the data and forces clusters. You can do PCA after MAGIC, or use our other tool called PHATE (on the raw data).

Do you library size normalize your data prior to running magic?

On Sat, Feb 17, 2018 at 9:15 PM adddddn notifications@github.com wrote:

Quick update: I tried running magic with ka=4 and ka=10 (and optimal t calculation). To evaluate the results, I re-calculated the TSNEs (see attached files). All of the original clusters were well-supported by canonical markers from the literature. However, the imputation seems to have blurred (and in some cases, obliterated) the cluster distinctions. Any ideas what might be going wrong?

Methods: I am running magic on the TPM matrix (not scaled/log-transformed). I am using 20 PCs, which has worked for this dataset previously. Otherwise, I am using the default parameters found in the MAGIC.py script (k=30, t=None, epsilon=1, r=99, etc). The optimal ts seemed reasonable. For ka=4, the optimal t was 5. For ka=10, the optimal t was . After running magic, I treat the imputed data as a TPM matrix and I run a standard pipeline to get the tSNE (i.e. variable gene selection, PCA on scaled log(TPM+1), tSNE). I have also tried using my original set of variable genes.

These new TSNEs are not necessarily incorrect, but they do not align with the established markers as well. So I'm curious if you have encountered similar problems, and whether you have any advice? Thank you.

Original [image: magic old tsne] https://user-images.githubusercontent.com/36550288/36347547-9d2bc2fe-1427-11e8-90d7-04649d10d9c0.png

ka=4 [image: magic ka4 tsne] https://user-images.githubusercontent.com/36550288/36347549-9d4c524e-1427-11e8-8bd2-d4ba42ca49ff.png

ka=10 [image: magic ka10 tsne] https://user-images.githubusercontent.com/36550288/36347548-9d380c12-1427-11e8-91b0-be2d732a76fb.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/magic/issues/70#issuecomment-366487121, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs7DM5Y-DNajbqoomf2_mmySlJI5Vks5tV4e9gaJpZM4RU4Si .

dpcook commented 6 years ago

You could plot cluster ID (pre-magic) vs. marker expression (pre/post-MAGIC). Should be tighter relationships post-MAGIC. Could also re-cluster the imputed data and see if it pulls out canonical cell types (based on marker expression). Validating gene-gene relationships is also helpful. These are probably the best ways to validate the data.

For visualization, as David said, tSNE falls apart post-MAGIC. Check simple PCA/Diffusion Map/PHATE embedding. If you're not used to looking at them, just a heads up that they will look different than tSNE plots (less circular patterns, more smears/streak patterns--that's normal).

dvdijk commented 6 years ago

also, see here: https://www.biorxiv.org/content/early/2017/12/01/120378.article-info for some examples where tsne fails

I recommend doing library size normalization at all times

On Sun, Feb 18, 2018 at 1:53 PM David Cook notifications@github.com wrote:

You could plot cluster ID (pre-magic) vs. marker expression (pre/post-MAGIC). Should be tighter relationships post-MAGIC. Could also re-cluster the imputed data and see if it pulls out canonical cell types (based on marker expression). Validating gene-gene relationships is also helpful. These are probably the best ways to validate the data.

For visualization, as David said, tSNE falls apart post-MAGIC. Check simple PCA/Diffusion Map/PHATE embedding. If you're not used to looking at them, just a heads up that they will look different than tSNE plots (less circular patterns, more smears/streak patterns--that's normal).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/KrishnaswamyLab/magic/issues/70#issuecomment-366538003, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEfs0IyAmDjxWMrCsYXk6navceaLBBgks5tWHG1gaJpZM4RU4Si .