Help understanding kernel matrix normalization + e-vals' scaling

dyballa commented 2 years ago

Hi All,

First off, thanks for making this nice package available. I tried to understand your code by following the BGH papers ("Nonparametric forecasting..." and "Variable bandwidth diffusion kernels"), and I have two questions regarding the operations performed by the DiffusionMaps object:

1) After you create right_norm_vec, why do you only "right-normalize" the kernel matrix by it? In the papers, the kernel matrix seems to be both left- and right-normalized (i.e., scaling both rows and columns by 1/q_eps^alpha).

2) I noticed evecs are scaled by sqrt(-1. / evals) to yield the diffusion map. In the Laffon/Coiffman diffusion maps, evectors are scaled by evals^t. But there, eigendecomposition is performed on a transition matrix, so e-vals <= 1. In BGH, I understand that, after subtracting the Identity from L, evals will become <= 0. So I see the need for scaling them by -1 (equivalent to taking the absolute value), but I couldn't figure out why you also take the inverse (-1/evals). Finally, you take the square-root: is that like setting the diffusion time t to 0.5? I found no discussion in the BGH papers about this step, so I was wondering if there is any technical reason for doing so or whether this was empirically found to be a good scaling procedure -- any clarification would be greatly appreciated!

Thanks for your time!

ehthiede commented 2 years ago

Hi dyballa. Glad our package can be of help.

We don't left normalize because the left-normalization will cancel out once we divide by the sum of the rows (line 189 in https://github.com/DiffusionMapsAcademics/pyDiffMap/blob/master/src/pydiffmap/diffusion_map.py)
Ultimately, how you scale the diffusion maps by the eigenvalues is a bit arbitrary: there are many good choices. Granted, it's been a while since we've implemented this so I'm somewhat hazy on the precise details we used. I think we used the scaling described in "Commute Maps: Separating Slowly Mixing Molecular Configurations for Kinetic Modeling" (Noe et al., JCTC, 2016). So the reason it doesn't look like the normalization in the Laffon/Coiffman paper is because it's a different normalization :-).

Hope that helps, please if I can be of further help.

dyballa commented 2 years ago

Thanks, ehthiede, I appreciate your prompt response.

That's nice.
Thanks for the link. If I understood correctly, in that paper they use a similar scaling sqrt((-1/eval)/2) by taking -1/eval as the relaxation time scale (section 2.3) as a way of integrating over the time tau; as a result, they obtain commute-time distances instead of diffusion distances. What got me slightly confused is that in section 2.4 they suggest that the relaxation time scales should be estimated from the data as -tau/(ln |eval|) instead (which agrees with their eq. 8). However, I've also found the same scaling by sqrt(1/lambda) to obtain commute times in "The principal components analysis of a graph, and its relationships to spectral clustering" (Saerens et al., ECML, 2004), using the un-normalized Laplacian instead (end of section 5.3). So thanks for pushing me in the right direction.

ehthiede commented 2 years ago

Glad to hear I could be of help. I'm closing the issue for now, but happy to re-open it if you have further questions.

DiffusionMapsAcademics / pyDiffMap

Help understanding kernel matrix normalization + e-vals' scaling #22