Closed bef-18 closed 1 week ago
TODOS:
TODO:
TODO:
- [x] Document to summarize the integration strategy
To use the pyLOM training pipeline with a new model, it is first necessary to have a specific interface. With this, the pipeline will be able to call the necessary methods to train a model without knowing its implementation. The interface defined for the models is the following:
class Model():
def fit(self, train_dataset: Dataset, eval_set=Optional **kwargs):
"""
Fit the model to the training data.
Args:
train_dataset: The training dataset.
eval_set (Optional): The evaluation dataset.
**kwargs: Additional parameters for the fit method.
"""
pass
def predict(self, X: Dataset, **kwargs):
"""
Predict the target values for the input data.
Args:
X: The input data. This dataset should have the same type as
the ones used on fit
**kwargs: Additional parameters for the predict method.
Returns:
np.array: The predicted target values.
"""
pass
@classmethod
def create_optimized_model(
cls,
train_dataset: Dataset,
eval_dataset: Optional[Dataset],
optuna_optimizer: OptunaOptimizer,
) -> Tuple["Model", Dict]:
"""
Create an optimized model using Optuna.
Args:
train_dataset (BaseDataset): The training dataset.
eval_dataset (Optional[BaseDataset]): The evaluation dataset.
optuna_optimizer (OptunaOptimizer): The optimizer to use for optimization.
Returns:
Tuple[Model, Dict]: The optimized model and the best parameters
found by the optimizer.
"""
def save(self, path: str):
"""
Save the model to a file.
Args:
path (str): The path to save the model.
"""
pass
@classmethod
def load(self, path: str):
"""
Load a model from a file.
Args:
path (str): The path to load the model from.
Returns:
Model: The loaded model.
"""
pass
As you might expect, the fit and predict methods may need more parameters to work properly. You can always add additional parameters but they must have a default value. To integrate the model into the pipeline only for training and not for optimization, the 'fit' function should be enough. However, that model will not have functionalities for saving, loading and optimizing.
The new model must be placed on a separate file under pyLOM/NN/architectures
and it to be added to pyLOM/NN's __init__.py
Note: the model does not have to inherit from this class, it is for explanation purposes.
The class OptunaOptimizer
is already implemented on pyLOM and is ready to be used. The constructor takes the optimization parameters, like on this example:
optimizer = pyLOM.NN.OptunaOptimizer(
optimization_params={
"lr": 0.01, # fixed parameter
"n_layers": (1, 4), # optimizable parameter,
"batch_size": (128, 512),
"hidden_size": 256,
"epochs": 30,
},
n_trials=10,
direction="minimize",
)
The idea here is that if the value of a parameter is a tuple, this parameter will be optimized for the range of the tuple, while if the value is a number, it will remain fixed and will not be optimized. This should be handled on each model. Then, to optimize the parameters of a model, you have to define the function create_optimized_model
in such a way that receiving an OptunaOptimizer
it returns a model, not yet trained, and a dictionary containing the best parameter to train the model later. To obtain the best optimizable parameters, create a function that receives as input an optuna.Trial
and returns a metric to optimize, such as the error of the model. This function can be passed to OptunaOptimizer.optimize and it will return the best parameters that optuna found. Finally, the optimization_params
dictionary should be updated with these "best parameters" and returned as mentioned in create_optimized_model
.
Regarding the datasets, there is a dataset implemented in pyLOM.NN
that is used to pass from a pyLOM
h5 dataset to a pytorch
dataset. At the moment, this is the dataset that is being used by the MLP and the Autoencoders, and if possible, it should be used by the new models.
With the merging of #55 and #56 @bef-18 and @DavidRamosArchilla we can start working on closing this development.
@DavidRamosArchilla I've finished fixing the examples and testsuite. Can you add an example for the MLP with synthetic data? I'll convert it into a test and we'll have this ready to merge.
@bef-18 please check that the VAE generates a time-dependent field.
@DavidRamosArchilla I've finished fixing the examples and testsuite. Can you add an example for the MLP with synthetic data? I'll convert it into a test and we'll have this ready to merge.
@bef-18 please check that the VAE generates a time-dependent field.
Ok, I've added the example, it doesn't use optuna but if necessary, I'll add it.
On a separate topic, I wanted to tell you that we already have a new model ready to be integrated in pyLOM. I understand that it will be better to add it once we close this pull request. This model should be easier to integrate
Perfect thanks!
Yes, Benet and I are still working on integrating the VAE. We are still encountering issues. You can start a new branch from 50-develop_mlp
and add the model there if you want. We won't merge it until we clear this one. As before, please open a separate issue and associated pull request.
Add a module for MLP neural network for pressure interpolation at different angles of attack