Closed papachristoumarios closed 2 years ago
Thank you for the input! Indeed we had discussed compatibility with sklearn when designing the package. One of the core ideas of GGLasso was to have one entry point (i.e. glasso_problem
) for all problem formulations. We derive the problem formulation from the shape of the input data S
(and additional arguments such as latent
and reg
) which is why we need the input already when creating the problem instance right now. Moreover, as regularization parameters are part of the problem and depend on the input (e.g. via scaling), we thought that it also makes sense to have the input data as part of the problem (even though this differs from sklearn).
Regarding the second point, do in understand you correctly that you propose to rename the .solve
method to .fit
? This would be an easy change (even though I personally think that problems are solved rather than fitted).
I think I'm inclined to agree with @papachristoumarios that this would be a nice change and could even buy you some functionality for free from scikit-learn. Here I hope is a skeleton example!
class FabianGraphicalLasso(EmpiricalCovariance):
def __init__(self, score_type='AIC'):
super().__init__()
self.score_type = score_type
def fit(self, X: Union[np.ndarray, Iterable[np.ndarray]], y=None):
self.N = X.shape[0]
self.S = empirical_covariance(X, assume_centered=self.assume_centered)
self._derive_problem_formulation()
self._solve() # sklearn has conventions for fitted parameters beginning ._fitted_parameter so this would set these
# so that
return self
def score(self, X_test, y=None):
S_test = empirical_covariance(X_test - self.location_, assume_centered=True)
if self.score_type == 'AIC':
res = self._aic(S_test)
else:
res = self._ebic(S_test)
return res
def _derive_problem_formulation(self):
pass
def _solve(self):
pass
def _aic(self, S):
pass
def _ebic(self, S):
pass
Actually thinking about it X could be your data and y could be labels k for the multiple case!
So the really cool thing about this route is you could substitute most of your gridsearch functionality for the scikit-learn functionality (i.e. could use random search, gridsearch, various kinds of pipelines)
I have noticed that in the project's source code and documentation methods that are present in similar sklearn classes such as GraphicalLasso have methods (e.g. the
fit
method) which can be found in the current project but under different names and specifications.For example, here the GGL class can be created separately and then the data can be provided via a
fit
method (like they do in sklearn) instead of defining agglasso_problem
with the data and then calling thesolve
method without any arguments.Thus, I think that the software can be improved by having method naming and functionality that is similar to sklearn (since sklearn is widely used by the data science / ML community).