ERMETE-Lab / ROSE-pyforce

Python Framework for data-driven model Order Reduction of multi-physiCs problEms
https://ermete-lab.github.io/ROSE-pyforce/intro.html
MIT License
9 stars 5 forks source link

JOSS Submission Review (Reviewer 2): Functionality #9

Closed damar-wicaksono closed 1 month ago

damar-wicaksono commented 2 months ago

This issue is related to the JOSS submission review.

The tutorials provided serve as a basis for reviewing the functionality of the software. Overall, the tutorials can be executed successfully with additional remarks that I detail below.

Installation

The requirements based on the environment.yml file allows the package to be installed successfully via conda in MacOS 14.5 (M2). However, when creating series of plots in the tutorials the following packages are missing:

Performance

As I don't see any performance claim regarding the package, I believe this criterion is irrelevant.

Functionality

computeLebesgue(geim_data[var_names[field_i]][jj].magic_fun, geim_data[var_names[field_i]][jj].magic_sens)

Both magic_fun and magic_sens are attributes of GEIM instances. Is there a particular reason not to implement this function as a method of the class (GEIM.compute_lebesgue() or GEIM.lebesgue()?

In my opinion tuple of length beyond 2 (absolute and relative errors) is already difficult to intuit which index signifies which quantity. Perhaps it would be better to store the output of the method as a dictionary or a named tuple whose relevant quantities can semantically be better accessed.

PS: What does synt_ stand for?

Other remarks

For further discussion...

Perhaps, this comment is more suitable for the documentation part but I'll put it here anyways...

The package provides the following methods/techniques:

Following the offline/online workflow, the package consists of two main sub-packages: pyforce.offline and pyforce.online. Looking at the features provided above, the following classes represent each technique:

It seems to me POD, GEIM, PBDW share many commonalities (the details differ, though) and I would assume that these classes serve the same overall purpose (again with their own particularities):

Steriva commented 1 month ago

Response Letter: Functionality

Thank you very much for the useful comments and suggestions. The responses and changes made to the code are listed here.

The changes have been provided with pull request #9.

Installation

The modifications these comments require will be addressed in the pull request, following issue #8. Since it is more related to the documentation, we decided to move the changes to the other branch in order to have an overall review of the tutorials.

Functionality

  1. The function computeLebesgue has been split in order to execute it even after the training of the GEIM algorithm on separate codes, avoiding re-training the GEIM algorithm. This can be useful when different Lebesgue constants have to be compared.
  2. About the plotting in the tutorials: the modifications required by these comments will be made in #8
  3. Regarding synt_test_error in the online phase
    • [x] The output of each method has been changed to namedtuple, as suggested, for errors and computational time to help the access to relevant quantities, structured as follows:
       Results = namedtuple('Results', ['mean_abs_err', 'mean_rel_err', 'computational_time'])

      The only exception is the PE class which includes also the estimated parameters, resulting from the optimization process. synt_ stands for synthetic, since the measurements are coming from simulation data (not actually collected on an experimental facility). For POD-related algorithms, it's used to highlight the fact that there is a comparison with simulation data, as well. GEIM, TR-GEIM and PBDW also include a real_reconstruct method to reconstruct the field from real data (given as input as arrays).

    • Yes, there is a geim class (same for pbdw) both in the subpackage offline and online. I am aware that this can create confusion, however they are not meant to be executed in the same script for real applications, as a consequence of the offline/online paradigm. We would prefer to keep this terminology to avoid importing like from pyforce.online.geim import geimOnline which we think can be a bit repetitive. I am adding some figures in the docs to help the user show how the classes are connected, as you suggested.

Other remarks

  1. Sorry about the typo about GramSchmidt: it has been fixed.
  2. An overall check on the classes has been performed to be consistent with the camel case convention.
    • [x] Classes gaussian_sensors to GaussianSensors, import_H5 to ImportH5, POD_project to PODproject are modified.
  3. It is true that this package collects some classes, available to mix: the general framework of DDROM wraps techniques able to represent high-dimensional data in a reduced state and to merge model with measures (i.e., local evaluations of quantities of interest) for quick and reliable state estimations for monitoring and control applications.
    • [x] Schemes of the different classes and algorithms for the offline and online phase have been added to the documentation (link to images and development branch) and the read me: the modifications required by these comments are to be made in #8.
  4. To what extent these methods are similar or dissimilar (in terms of the class), looking at the attributes and methods they share several similarities but because there is no parent class or abstract class (to, perhaps, mimics interface) it is hard to quickly get the picture. For example, all instances share V and norm (note that in PBDW it is called norms with an s) as attributes. Furthermore, the online counterparts all have the method synt_test_error():
    • The POD, GEIM and PBDW classes have been implemented in different time periods and to stay simple no parent/abstract class was coded; even though it is true they do share a lot of similarities, especially in the online phase. Discussing with @Neko-tan, we think it will be for sure one of the main enhancements of future releases of the package and it is planned in the future.
  5. How to put the classes inside pyforce.offline.sensors in the context of POD, GEIM, and PBDW. Currently, the documentation lacks the intermediate-level picture of data flow between the phases that can be translated directly to the code (the Figure on DDROM is too high-level and not readily translatable to the usage workflow and code structure). In the tutorials, numerous quantities are saved and loaded in between notebooks which makes it a bit hard to get the bigger picture of the package capabilities; that is, the functionalities provided by the package via its classes get a bit lost in things being saved and plotted. Can one mix and match the results of offline phase from one method with the online phase of another? Is there any restrictions?
    • pyforce.offline.sensors is a class required to generate the Riesz representation of a set of sensors, either in L2 or H1, necessary by GEIM and PBDW. As already mentioned, the connection between the different classes is being added to the documentation and they should provide a clear overview of the package functionalities (for instance, the basis functions from either POD or GEIM can be mixed together with GEIM sensors in the PBDW class; other combinations are not suggested). For what concerns the tutorials, they are under update in the issue #8 to help external users understand the package capabilities.

Let us know if some changes or responses are not clear, thank you again :)

damar-wicaksono commented 1 month ago

Thank you for going through my comments!

Here are my replies:

The modifications these comments require will be addressed in the pull request, following issue https://github.com/ERMETE-Lab/ROSE-pyforce/issues/8.

Thank you!

The function computeLebesgue has been split in order to execute it even after the training of the GEIM algorithm on separate codes, avoiding re-training the GEIM algorithm. This can be useful when different Lebesgue constants have to be compared.

Okay!

The output of each method has been changed to namedtuple, as suggested, for errors and computational time to help the access to relevant quantities

Thanks!

synt_ stands for synthetic, since the measurements are coming from simulation data (not actually collected on an experimental facility).

Thank you for the explanation. I think it would be nice to put your explanation (or a version of it) in the codebase as in-code documentation at a point(s) these terms appear prominently or in the documentation itself regarding the difference between these two terminologies ("synthetic measurements" vs "real measurements").

We would prefer to keep this terminology to avoid importing like from pyforce.online.geim import geimOnline which we think can be a bit repetitive. I am adding some figures in the docs to help the user show how the classes are connected, as you suggested.

Understood!

Sorry about the typo about GramSchmidt: it has been fixed.

Great!

An overall check on the classes has been performed to be consistent with the camel case convention.

Thank you!

Schemes of the different classes and algorithms for the offline and online phase have been added to the documentation (link to images and development branch) and the read me:

I think the diagrams for classes and the required data and how they relate to each other would be a nice addition to the documentation. Thanks!

The POD, GEIM and PBDW classes have been implemented in different time periods and to stay simple no parent/abstract class was coded; even though it is true they do share a lot of similarities, especially in the online phase. Discussing with @Neko-tan, we think it will be for sure one of the main enhancements of future releases of the package and it is planned in the future.

I understand. The current structure is not a breaking point but it would be worthwhile to revisit it in the future. Thank you.

pyforce.offline.sensors is a class required to generate the Riesz representation of a set of sensors, either in L2 or H1, necessary by GEIM and PBDW. As already mentioned, the connection between the different classes is being added to the documentation and they should provide a clear overview of the package functionalities (for instance, the basis functions from either POD or GEIM can be mixed together with GEIM sensors in the PBDW class; other combinations are not suggested).

Yes, I think the documentation update on this would certainly be helpful.


I think you've addressed all my comments and I have no further points to raise in this area, except for a minor point above regarding updating the documentation a bit.

As the remaining points are deferred to the documentation, I will have a look into it separately.

I'm closing this issue now...