DiffusionMapsAcademics / pyDiffMap

Library for diffusion maps
MIT License
46 stars 14 forks source link

Move to using Transition rate matrix rather than Transition matrix #16

Open ehthiede opened 6 years ago

ehthiede commented 6 years ago

I'm wondering if we should move from using the transition matrix in the code to the transition rate matrix $L = (P-I) / \epsilon$ (modulo the appropriate scaling factor due to how we've defined the exponential).

This would have the following advantages.

  1. It would make the code more naturally compatibile with the variable bandwidth schemes
  2. The spectral gap of L will not go to zero as epsilon goes to zero, so I suspect the linear algebra will be better posed (although I've never seen this effect in practice)
  3. L is more naturally interpretable for cases where alpha is not 0.5 or 1.
  4. We don't have to do this weird scaling with the output eigenvalues every time we look at the eigenvalues, as that will automatically be done by the code

I can't really think of any disadvantages, apart from slightly added complexity, and the fact that people seem to be slightly more comfortable with transition matrices than transition rate matrices.

Thoughts?

ralfbanisch commented 6 years ago

Hi Erik,

So as far as I remember, when one rescales the eigenvectors by \lambda^t for the diffusion map embedding, one usually uses the eigenvectors of P (after all, the eigenvectors of L are negative and increasing in magnitude, and don’t behave very well when raised to a power).

Of course there is a simple algebraic relation between the two, but it seems that what is preferable depends on the context: If one is interested in how things converge to the graph laplacian in the limit of infinite data and epsilon going to zero then of course one would want to use the generator L, but if one is interested in diffusion map embeddings then P seems more natural.

-Ralf

On 21. Feb 2018, at 06:57, Erik Henning Thiede notifications@github.com wrote:

I'm wondering if we should move from using the transition matrix in the code to the transition rate matrix $L = (P-I) / \epsilon$ (modulo the appropriate scaling factor due to how we've defined the exponential).

This would have the following advantages.

It would make the code more naturally compatibile with the variable bandwidth schemes The spectral gap of L will not go to zero as epsilon goes to zero, so I suspect the linear algebra will be better posed (although I've never seen this effect in practice) L is more naturally interpretable for cases where alpha is not 0.5 or 1. We don't have to do this weird scaling with the output eigenvalues every time we look at the eigenvalues, as that will automatically be done by the code I can't really think of any disadvantages, apart from slightly added complexity, and the fact that people seem to be slightly more comfortable with transition matrices than transition rate matrices.

Thoughts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DiffusionMapsAcademics/pyDiffMap/issues/16, or mute the thread https://github.com/notifications/unsubscribe-auth/ARuMPjBVJrjePhw1vyFyLaiFVFquQMJXks5tW7BbgaJpZM4SNHfg.

ehthiede commented 6 years ago

I think if you were working with the eigenvectors of/value of L, you'd scale by e^(\lambda t), as the dynamics are for a continuous time, not discrete time Markov process.

More generally, I would say this is a point for using the Laplacian picture (and I realize I'm sort of standing against Coifman / Lafon by saying this). I think that if you would want to compare two different diffusion mappings at any given $t$ (e.g. for two different molecules), if you don't look at the eigenfunctions of the $L$ you aren't getting commensurate mappings.

ehthiede commented 5 years ago

Issue closed as discussion is moved into a more general comment regarding variable bandwidth diffusion maps.

ehthiede commented 5 years ago

Reopened because I realized a workaround for the VBW case, so we can indeed discuss these issues separately.