bef-18 commented 1 month ago

Add a module for MLP neural network for pressure interpolation at different angles of attack

DavidRamosArchilla commented 1 month ago

TODOS:

[ ] Add the option to run torch.complie on the pipeline
[ ] Add the option run on different precision float types. E.g. bfloat16 to speed up training. This should be model-side
[ ] Use DDP or huggingface's accelerator to use multiple GPUs

ArnauMiro commented 1 month ago

TODO:

[x] Document to summarize the integration strategy

DavidRamosArchilla commented 1 month ago

TODO:

[x] Document to summarize the integration strategy

How to integrate a new model on pyLOM

To use the pyLOM training pipeline with a new model, it is first necessary to have a specific interface. With this, the pipeline will be able to call the necessary methods to train a model without knowing its implementation. The interface defined for the models is the following:

class Model():
    def fit(self, train_dataset: Dataset, eval_set=Optional **kwargs):
        """
        Fit the model to the training data.

        Args:
            train_dataset: The training dataset.
            eval_set (Optional): The evaluation dataset.
            **kwargs: Additional parameters for the fit method.
        """

        pass

    def predict(self, X: Dataset, **kwargs):
        """
        Predict the target values for the input data.

        Args:
            X: The input data. This dataset should have the same type as 
            the ones used on fit
            **kwargs: Additional parameters for the predict method.

        Returns:
            np.array: The predicted target values.
        """
        pass

    @classmethod
    def create_optimized_model(
        cls,
        train_dataset: Dataset,
        eval_dataset: Optional[Dataset],
        optuna_optimizer: OptunaOptimizer,
    ) -> Tuple["Model", Dict]:
        """
        Create an optimized model using Optuna.

        Args:
            train_dataset (BaseDataset): The training dataset.
            eval_dataset (Optional[BaseDataset]): The evaluation dataset.
            optuna_optimizer (OptunaOptimizer): The optimizer to use for optimization.

        Returns:
            Tuple[Model, Dict]: The optimized model and the best parameters
            found by the optimizer.
        """

    def save(self, path: str):
        """
        Save the model to a file.

        Args:
            path (str): The path to save the model.
        """
        pass

    @classmethod
    def load(self, path: str):
        """
        Load a model from a file.

        Args:
            path (str): The path to load the model from.

        Returns:
            Model: The loaded model.
        """
        pass

As you might expect, the fit and predict methods may need more parameters to work properly. You can always add additional parameters but they must have a default value. To integrate the model into the pipeline only for training and not for optimization, the 'fit' function should be enough. However, that model will not have functionalities for saving, loading and optimizing.

The new model must be placed on a separate file under pyLOM/NN/architectures and it to be added to pyLOM/NN's __init__.py

Note: the model does not have to inherit from this class, it is for explanation purposes.

Optimization with optuna

The class OptunaOptimizer is already implemented on pyLOM and is ready to be used. The constructor takes the optimization parameters, like on this example:

optimizer = pyLOM.NN.OptunaOptimizer(
optimization_params={
        "lr": 0.01, # fixed parameter
        "n_layers": (1, 4), # optimizable parameter,
        "batch_size": (128, 512),
        "hidden_size": 256,
        "epochs": 30,
    },
    n_trials=10,
    direction="minimize",
)

The idea here is that if the value of a parameter is a tuple, this parameter will be optimized for the range of the tuple, while if the value is a number, it will remain fixed and will not be optimized. This should be handled on each model. Then, to optimize the parameters of a model, you have to define the function create_optimized_model in such a way that receiving an OptunaOptimizer it returns a model, not yet trained, and a dictionary containing the best parameter to train the model later. To obtain the best optimizable parameters, create a function that receives as input an optuna.Trial and returns a metric to optimize, such as the error of the model. This function can be passed to OptunaOptimizer.optimize and it will return the best parameters that optuna found. Finally, the optimization_params dictionary should be updated with these "best parameters" and returned as mentioned in create_optimized_model.

Comment on the datasets

Regarding the datasets, there is a dataset implemented in pyLOM.NN that is used to pass from a pyLOM h5 dataset to a pytorch dataset. At the moment, this is the dataset that is being used by the MLP and the Autoencoders, and if possible, it should be used by the new models.

ArnauMiro commented 1 month ago

With the merging of #55 and #56 @bef-18 and @DavidRamosArchilla we can start working on closing this development.

ArnauMiro commented 2 weeks ago

@DavidRamosArchilla I've finished fixing the examples and testsuite. Can you add an example for the MLP with synthetic data? I'll convert it into a test and we'll have this ready to merge.

@bef-18 please check that the VAE generates a time-dependent field.

DavidRamosArchilla commented 2 weeks ago

@DavidRamosArchilla I've finished fixing the examples and testsuite. Can you add an example for the MLP with synthetic data? I'll convert it into a test and we'll have this ready to merge.

@bef-18 please check that the VAE generates a time-dependent field.

Ok, I've added the example, it doesn't use optuna but if necessary, I'll add it.

On a separate topic, I wanted to tell you that we already have a new model ready to be integrated in pyLOM. I understand that it will be better to add it once we close this pull request. This model should be easier to integrate

ArnauMiro commented 2 weeks ago

Perfect thanks!

Yes, Benet and I are still working on integrating the VAE. We are still encountering issues. You can start a new branch from 50-develop_mlp and add the model there if you want. We won't merge it until we clear this one. As before, please open a separate issue and associated pull request.

ArnauMiro / pyLowOrder

Adding MLP to pyLOM NN module #50

How to integrate a new model on pyLOM

Optimization with optuna

Comment on the datasets