damar-wicaksono commented 2 months ago

This issue is related to the JOSS submission review.

The tutorials provided serve as a basis for reviewing the functionality of the software. Overall, the tutorials can be executed successfully with additional remarks that I detail below.

Installation

The requirements based on the environment.yml file allows the package to be installed successfully via conda in MacOS 14.5 (M2). However, when creating series of plots in the tutorials the following packages are missing:

trame
ipywidgets
openpyxl
trame-vtk
[x] Please double check if these packages are indeed necessary (at least to follow the tutorials) and include them in the requirements (perhaps optionally).
[x] In MacOS 14.5, the statement pv.start_xvfb() raises exception about libgl1-mesa-glx xvfb not available. However, removing this statement will allow the plots to be generated.
[x] It appears (also in MacOS 14.5), some plotting causes a complain that vue must be vue2 instead of vue3; this may be related to the installed version of trame.
[x] The dependencies version listed in the environment.yml are specified exactly (i.e., specified up to the PATCH version) which I think may lock the versions required and can be restrictive. Is this intentional?

Performance

As I don't see any performance claim regarding the package, I believe this criterion is irrelevant.

Functionality

[x] In one of the tutorials, the Lebesgue constant is computed by calling the function from pyforce.offline.geim import computeLebesgue as follows:

computeLebesgue(geim_data[var_names[field_i]][jj].magic_fun, geim_data[var_names[field_i]][jj].magic_sens)

Both magic_fun and magic_sens are attributes of GEIM instances. Is there a particular reason not to implement this function as a method of the class (GEIM.compute_lebesgue() or GEIM.lebesgue()?

[x] I find that the tutorials are a bit cluttered with complicated plotting routines and they often get in the way of presenting the (expected) results. One straightforward way to declutter the tutorials is by extracting these routines into local modules that are only relevant for the tutorials. Alternatively, if some of the plotting routines are general in nature, they can be implemented as a high-level plotting functions directly in the code base (but only if they are general).
[x] According to the tutorials, the method synt_test_error() is often called with respect to instances of classes in the online phase. The return value is a tuple but each implementation of the method have different length and for quantity that means the same the position may not be consistent (e.g., computational_time often appears as the last element of the tuple except for instances of the PE() class).

In my opinion tuple of length beyond 2 (absolute and relative errors) is already difficult to intuit which index signifies which quantity. Perhaps it would be better to store the output of the method as a dictionary or a named tuple whose relevant quantities can semantically be better accessed.

PS: What does synt_ stand for?

[x] High-level classes GEIM and PBDW have exactly the same name in two different modules offline.geim (resp. offline.pbdw) and online.geim (resp. online.pbdw). While this is definitely not an issue in Python, I'd argue that they can be confusing for high-level classes. We must know from where which is imported and in case we need both on, say, the same script, we have to rename one of them on the fly (via as). I'd propose renaming the classes such that the name indicates whether the class is related to the offline or online phase (like POD and PODI)

Other remarks

For further discussion...

[x] The class POD has a method called GrahmSchmidt, isn't this supposed to be GramSchmidt? (referring to Gram-Schmidt orthogonalization procedure?)
[x] The convention in Python is to name a class using camel case, so name like POD_project() is not exactly following the convention.
[x] I'm not an expert in DDROM, so while the tutorials are helpful in assessing the functionality of the package, I'm a bit lost at mapping the available classes both in offline and online phase and how the classes should relate to each other. Currently, it feels more like a collection of classes that are available for people to mix and match (this impression may or may not be true) and not yet as a coherent framework. It's also a bit hard thinking of these classes in a higher level (which is helpful in understanding the package more quickly) because the commonalities between them are not explicit.

Perhaps, this comment is more suitable for the documentation part but I'll put it here anyways...

The package provides the following methods/techniques:

Proper Orthogonal Decomposition with Projection and Interpolation for the Online Phase
Generalised Empirical Interpolation Method, either regularised with Tikhonov or not
Parameterised-Background Data-Weak formulation
an Indirect Reconstruction algorithm to reconstruct non-observable fields

Following the offline/online workflow, the package consists of two main sub-packages: pyforce.offline and pyforce.online. Looking at the features provided above, the following classes represent each technique:

POD:
- pyforce.offline.pod.POD implements POD in the offline phase (pyforce.offline.pod.DiscretePOD does not appear in the tutorial, I believe)
- pyforce.online.pod_interpolation.PODI implements POD in the online phase via interpolation
- pyforce.online.pod_projection.POD_project implements POD in the online phase via projection
GEIM:
- pyforce.offline.geim.GEIM implements GEIM in the offline phase
- pyforce.online.geim_synthetic.GEIM implements GEIM in the online phase (without Tikhonov's regularization)
- pyforce.online.tr_geim_synthetic.TRGEIM implements GEIM with Tikhonov's regularization in the online phase
PBDW:
- pyforce.offline.pbdw.PBDW implements PBDW in the offline phase
- pyforce.online.pbdw_synthetic.PBDW implements PBDW in the online phase
Sensor placement (only relevant for the offline phase):
- pyforce.offline.sensors.SGREEDY (exemplified in the tutorial)
- pyforce.offline.sensors.gaussian_sensors (perhaps not a main feature)
Indirect reconstruction (only relevant for the online phase):
- pyforce.online.indirect_recon.PE (exemplified in the tutorial)

It seems to me POD, GEIM, PBDW share many commonalities (the details differ, though) and I would assume that these classes serve the same overall purpose (again with their own particularities):

[x] To what extend these methods are similar or dissimilar (in terms of the class), looking at the attributes and methods they share several similarities but because there is no parent class or abstract class (to, perhaps, mimics interface) it is hard to quickly get the picture. For example, all instances share V and norm (note that in PBDW it is called norms with an s) as attributes. Furthermore, the online counterparts all have the method synt_test_error().
[x] How to put the classes inside pyforce.offline.sensors in the context of POD, GEIM, and PBDW. Currently the documentation lacks the intermediate-level picture of data flow between the phases that can be translated directly to the code (the Figure on DDROM is too high-level and not readily translatable to the usage workflow and code structure). In the tutorials, numerous quantities are saved and loaded in between notebooks which makes it a bit hard to get the bigger picture of the package capabilities; that is, the functionalities provided by the package via its classes get a bit lost in things being saved and plotted.
[x] Can one mix match the results of offline phase from one method with the online phase of another? Is there any restrictions?

Steriva commented 1 month ago

Response Letter: Functionality

Thank you very much for the useful comments and suggestions. The responses and changes made to the code are listed here.

The changes have been provided with pull request #9.

Installation

The modifications these comments require will be addressed in the pull request, following issue #8. Since it is more related to the documentation, we decided to move the changes to the other branch in order to have an overall review of the tutorials.

Functionality

The function computeLebesgue has been split in order to execute it even after the training of the GEIM algorithm on separate codes, avoiding re-training the GEIM algorithm. This can be useful when different Lebesgue constants have to be compared.
About the plotting in the tutorials: the modifications required by these comments will be made in #8
Regarding synt_test_error in the online phase
- [x] The output of each method has been changed to namedtuple, as suggested, for errors and computational time to help the access to relevant quantities, structured as follows:
```
 Results = namedtuple('Results', ['mean_abs_err', 'mean_rel_err', 'computational_time'])
```
  The only exception is the PE class which includes also the estimated parameters, resulting from the optimization process. synt_ stands for synthetic, since the measurements are coming from simulation data (not actually collected on an experimental facility). For POD-related algorithms, it's used to highlight the fact that there is a comparison with simulation data, as well. GEIM, TR-GEIM and PBDW also include a real_reconstruct method to reconstruct the field from real data (given as input as arrays).
- Yes, there is a geim class (same for pbdw) both in the subpackage offline and online. I am aware that this can create confusion, however they are not meant to be executed in the same script for real applications, as a consequence of the offline/online paradigm. We would prefer to keep this terminology to avoid importing like from pyforce.online.geim import geimOnline which we think can be a bit repetitive. I am adding some figures in the docs to help the user show how the classes are connected, as you suggested.

Other remarks

Sorry about the typo about GramSchmidt: it has been fixed.
An overall check on the classes has been performed to be consistent with the camel case convention.
- [x] Classes gaussian_sensors to GaussianSensors, import_H5 to ImportH5, POD_project to PODproject are modified.
It is true that this package collects some classes, available to mix: the general framework of DDROM wraps techniques able to represent high-dimensional data in a reduced state and to merge model with measures (i.e., local evaluations of quantities of interest) for quick and reliable state estimations for monitoring and control applications.
- [x] Schemes of the different classes and algorithms for the offline and online phase have been added to the documentation (link to images and development branch) and the read me: the modifications required by these comments are to be made in #8.
To what extent these methods are similar or dissimilar (in terms of the class), looking at the attributes and methods they share several similarities but because there is no parent class or abstract class (to, perhaps, mimics interface) it is hard to quickly get the picture. For example, all instances share V and norm (note that in PBDW it is called norms with an s) as attributes. Furthermore, the online counterparts all have the method synt_test_error():
- The POD, GEIM and PBDW classes have been implemented in different time periods and to stay simple no parent/abstract class was coded; even though it is true they do share a lot of similarities, especially in the online phase. Discussing with @Neko-tan, we think it will be for sure one of the main enhancements of future releases of the package and it is planned in the future.
How to put the classes inside pyforce.offline.sensors in the context of POD, GEIM, and PBDW. Currently, the documentation lacks the intermediate-level picture of data flow between the phases that can be translated directly to the code (the Figure on DDROM is too high-level and not readily translatable to the usage workflow and code structure). In the tutorials, numerous quantities are saved and loaded in between notebooks which makes it a bit hard to get the bigger picture of the package capabilities; that is, the functionalities provided by the package via its classes get a bit lost in things being saved and plotted. Can one mix and match the results of offline phase from one method with the online phase of another? Is there any restrictions?
- pyforce.offline.sensors is a class required to generate the Riesz representation of a set of sensors, either in L2 or H1, necessary by GEIM and PBDW. As already mentioned, the connection between the different classes is being added to the documentation and they should provide a clear overview of the package functionalities (for instance, the basis functions from either POD or GEIM can be mixed together with GEIM sensors in the PBDW class; other combinations are not suggested). For what concerns the tutorials, they are under update in the issue #8 to help external users understand the package capabilities.

Let us know if some changes or responses are not clear, thank you again :)

damar-wicaksono commented 1 month ago

Thank you for going through my comments!

Here are my replies:

The modifications these comments require will be addressed in the pull request, following issue https://github.com/ERMETE-Lab/ROSE-pyforce/issues/8.

Thank you!

The function computeLebesgue has been split in order to execute it even after the training of the GEIM algorithm on separate codes, avoiding re-training the GEIM algorithm. This can be useful when different Lebesgue constants have to be compared.

Okay!

The output of each method has been changed to namedtuple, as suggested, for errors and computational time to help the access to relevant quantities

Thanks!

synt_ stands for synthetic, since the measurements are coming from simulation data (not actually collected on an experimental facility).

Thank you for the explanation. I think it would be nice to put your explanation (or a version of it) in the codebase as in-code documentation at a point(s) these terms appear prominently or in the documentation itself regarding the difference between these two terminologies ("synthetic measurements" vs "real measurements").

We would prefer to keep this terminology to avoid importing like from pyforce.online.geim import geimOnline which we think can be a bit repetitive. I am adding some figures in the docs to help the user show how the classes are connected, as you suggested.

Understood!

Sorry about the typo about GramSchmidt: it has been fixed.

Great!

An overall check on the classes has been performed to be consistent with the camel case convention.

Thank you!

Schemes of the different classes and algorithms for the offline and online phase have been added to the documentation (link to images and development branch) and the read me:

I think the diagrams for classes and the required data and how they relate to each other would be a nice addition to the documentation. Thanks!

The POD, GEIM and PBDW classes have been implemented in different time periods and to stay simple no parent/abstract class was coded; even though it is true they do share a lot of similarities, especially in the online phase. Discussing with @Neko-tan, we think it will be for sure one of the main enhancements of future releases of the package and it is planned in the future.

I understand. The current structure is not a breaking point but it would be worthwhile to revisit it in the future. Thank you.

pyforce.offline.sensors is a class required to generate the Riesz representation of a set of sensors, either in L2 or H1, necessary by GEIM and PBDW. As already mentioned, the connection between the different classes is being added to the documentation and they should provide a clear overview of the package functionalities (for instance, the basis functions from either POD or GEIM can be mixed together with GEIM sensors in the PBDW class; other combinations are not suggested).

Yes, I think the documentation update on this would certainly be helpful.

I think you've addressed all my comments and I have no further points to raise in this area, except for a minor point above regarding updating the documentation a bit.

As the remaining points are deferred to the documentation, I will have a look into it separately.

I'm closing this issue now...

ERMETE-Lab / ROSE-pyforce

JOSS Submission Review (Reviewer 2): Functionality #9

Installation

Performance

Functionality

Other remarks

Response Letter: Functionality

Installation

Functionality

Other remarks