DiffusionMapsAcademics / pyDiffMap

Library for diffusion maps
MIT License
46 stars 14 forks source link

Review data that gets saved into DiffusionMap object #14

Open ralfbanisch opened 6 years ago

ralfbanisch commented 6 years ago

Currently, we are saving a lot of attributes when the fit(..) function is called (line 120ff):

        # Save constructed data.
        self.local_kernel = my_kernel
        self.epsilon_fitted = my_kernel.epsilon_fitted
        self.d = my_kernel.d
        self.data = X
        self.weights = weights
        self.kernel_matrix = kernel_matrix
        self.P = P
        self.q = q
        self.right_norm_vec = right_norm_vec
        self.evals = evals
        self.evecs = evecs
        self.dmap = damp

We should review what exactly needs to be saved here, otherwise we are too heavy on system memory. For example, saving the kernel object instance, the kernel matrix and P (which is the kernel matrix normalised) all separately does create overhead which is not needed.

ehthiede commented 6 years ago

So I think we could definitely condense the "evecs" and "dmap" together, since they are just scaled versions of the other. I'm fine with getting rid of the kernel_matrix object. The only reason I can think of keeping it is that if you are calculating the eigenvectors, in practice you would get slightly better to take the kernel matrix and use the symmetrized normalization, and then solve that eigenvector problem. But in practice I've never actually noticed a difference in the eigenvectors from doing this.