Using different basis with 2 dimensions or more

diegoeduardok commented 2 years ago

When I try to change to certain types of basis, I get errors if I am using two dimensions. Does the package support these use cases? What about more than two dimensions?

Code follows

# Generate dummy data
t = np.tile(np.arange(10), 5)
dim_1 = np.random.normal(size=50, loc=1)
dim_2 = np.random.normal(size=50, loc=3)
X = np.vstack([t, dim_1, dim_2]).T

# Fit model
dim_x = 2
dim_h = 2
basis = GLE_BasisTransform(basis_type="linear") # Runs: linear, free_energy. Throws errors: bsplines, free_energy_kde.
estimator = GLE_Estimator(dim_x=dim_x, dim_h=dim_h, basis=basis)
estimator.fit(X)

Tracebacks:

With basis_type=bsplines


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [123], in <cell line: 1>()
----> 1 estimator.fit(X)

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_gle_estimator.py:393, in GLE_Estimator.fit(self, X, y, idx_trajs) 390 self.dt = X[1, 0] - X[0, 0] 391 self._check_initial_parameters() --> 393 Xproc, idx_trajs = self.model_class.preprocessingTraj(self.basis, X, idx_trajs=idx_trajs) 394 traj_list = np.split(Xproc, idx_trajs) 395 _min_traj_len = np.min([trj.shape[0] for trj in traj_list])

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_euler_model.py:35, in EulerModel.preprocessingTraj(self, basis, X, idx_trajs) 33 dt = X[1, 0] - X[0, 0] 34 v = (np.roll(X[:, 1 : 1 + self.dim_x], -1, axis=0) - X[:, 1 : 1 + self.dim_x]) / dt ---> 35 bk = basis.fit_transform(X[:, 1 : 1 + self.dim_x]) 36 v_plus = np.roll(v, -1, axis=0) 37 Xtraj = np.hstack((v_plus, v, v, bk))

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/sklearn/base.py:867, in TransformerMixin.fit_transform(self, X, y, fit_params) 863 # non-optimized default implementation; override when a better 864 # method is possible for a given clustering algorithm 865 if y is None: 866 # fit method of arity 1 (unsupervised transformation) --> 867 return self.fit(X, fit_params).transform(X) 868 else: 869 # fit method of arity 2 (supervised transformation) 870 return self.fit(X, y, **fit_params).transform(X)

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_gle_basis_projection.py:130, in GLE_BasisTransform.fit(self, X, y) 127 self.nb_basiselt = self.dim_x 129 if self.tocombine: # If it is needed to combine the features --> 130 self.combinations = _combinations(self.nb_basis_elt_per_dim, self.dimx, False, False) 131 self.ncomb = sum(1 for _ in combinations) 132 self.nb_basiselt = self.nb_basis_elt_per_dim ** self.dim_x

AttributeError: 'GLE_BasisTransform' object has no attribute 'nb_basis_elt_per_dim'


- With `basis_type=free_energy_kde`
```python
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [121], in <cell line: 1>()
----> 1 estimator.fit(X)

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_gle_estimator.py:393, in GLE_Estimator.fit(self, X, y, idx_trajs)
    390 self.dt = X[1, 0] - X[0, 0]
    391 self._check_initial_parameters()
--> 393 Xproc, idx_trajs = self.model_class.preprocessingTraj(self.basis, X, idx_trajs=idx_trajs)
    394 traj_list = np.split(Xproc, idx_trajs)
    395 _min_traj_len = np.min([trj.shape[0] for trj in traj_list])

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_euler_model.py:35, in EulerModel.preprocessingTraj(self, basis, X, idx_trajs)
     33 dt = X[1, 0] - X[0, 0]
     34 v = (np.roll(X[:, 1 : 1 + self.dim_x], -1, axis=0) - X[:, 1 : 1 + self.dim_x]) / dt
---> 35 bk = basis.fit_transform(X[:, 1 : 1 + self.dim_x])
     36 v_plus = np.roll(v, -1, axis=0)
     37 Xtraj = np.hstack((v_plus, v, v, bk))

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/sklearn/base.py:867, in TransformerMixin.fit_transform(self, X, y, **fit_params)
    863 # non-optimized default implementation; override when a better
    864 # method is possible for a given clustering algorithm
    865 if y is None:
    866     # fit method of arity 1 (unsupervised transformation)
--> 867     return self.fit(X, **fit_params).transform(X)
    868 else:
    869     # fit method of arity 2 (supervised transformation)
    870     return self.fit(X, y, **fit_params).transform(X)

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_gle_basis_projection.py:123, in GLE_BasisTransform.fit(self, X, y)
    119 self.dim_x = X.shape[1]
    121 self._initialize()
--> 123 self.featuresTransformer = self.featuresTransformer.fit(X)
    124 if hasattr(self.featuresTransformer, "n_output_features_"):
    125     self.nb_basis_elt_ = self.featuresTransformer.n_output_features_

File ~/anaconda3/envs/pytorch/lib/python3.9/site-packages/GLE_analysisEM/_gle_potential_projection.py:117, in GLE_PotentialTransform.fit(self, X, y)
    115     self.fe_spline_ = interpolate.splrep(xf, self.fe_, per=self.per)  # A bit of smoothing
    116 elif self.dim_x == 2:
--> 117     xfa = [(edge[1:] + edge[:-1]) / 2.0 for edge in self.edges_hist_]
    118     # x, y = np.meshgrid(xfa[0], xfa[1])
    119     # fe_flat = pf.flatten()
    120     # x_coords = x.flatten()[np.nonzero(fe_flat)]
    121     # y_coords = y.flatten()[np.nonzero(fe_flat)]
    122     self.fe_spline_ = interpolate.RectBivariateSpline(xfa[0], xfa[1], self.fe_)

AttributeError: 'GLE_PotentialTransform' object has no attribute 'edges_hist_'

HadrienNU commented 2 years ago

For now only the linear basis support any dimension. Free_energy support 2D, and that should be the case of free_energy_kde, this is a bug that I am going to correct Bsplines are not supported in 2D for now, but if you required it, I can implement it quite easily, it is only half implemented. However, beyond 2D, the number of basis element start to grow exponentially and other solutions have to be find. Note that the package support a custom basis type, so if you have a particular usecase and a basis that fit your need, it can be used.

HadrienNU commented 2 years ago

With the last commit, 2D free_energy_kde should be supported. bsplines (as well as other basis_type such as polynomial but that is untested) are now supported for arbitrary dimensions. However, the basis function in that case are tensor product of 1D basis function, so the dimensionality of the basis will grow quickly with the number of dimension.

diegoeduardok commented 2 years ago

free_energy_kde and bsplines are working for the 2D case now (thank you!).

I will close this issue now, but I also wanted to know about the random initialization of the parameters. It seems that the minimum eigenvalue for the random matrices is dependent on the length of the trajectory. What is the reason behind this?

HadrienNU commented 2 years ago

So, it is better to avoid zero eigenvalues in the initial value as this can cause numerical issue. To choose a minimal value: the eigenvalues are related to the inverse of timescale of the memory kernel, so I force the largest timescale to be equivalent of the size of the trajectories. The assumption behind is that trajectories are longer than the memory kernel largest timescale.

HadrienNU / GLE_AnalysisEM

Using different basis with 2 dimensions or more #12