ERMETE-Lab / ROSE-pyforce

Python Framework for data-driven model Order Reduction of multi-physiCs problEms
https://ermete-lab.github.io/ROSE-pyforce/intro.html
MIT License
9 stars 5 forks source link

JOSS Submission Review (Reviewer 2): Documentation #8

Closed damar-wicaksono closed 2 weeks ago

damar-wicaksono commented 2 months ago

This issue is related to the JOSS submission review.

I'm currently reviewing the documentation and below are some remarks, questions, and suggestions that the authors may want to consider regarding the documentation (and testing); please let me know if clarifications are needed. What listed below are partial as I haven't gone through to the whole documentation. I will put additional remarks/suggestions if/when they are needed.

Landing page

PS: Clicking the docs badge in the GitHub repository actually brings us to the intro page anyway and not index.html.

Welcome to pyforce's documentation

Description

What is actually "ROSE", is it a project? a software framework? or something else? Perhaps there is a link to the project site?

How to cite pyforce

What is ROM

Reduced Basis Methods

Data-Driven ROM techniques

Installation notes

Tutorials

The tutorials are provided in great detail and appreciated. I would propose to pick one of the tutorials to serve as a "Getting Started" part of the documentation that serves as an entry point in using the package. This "Getting Started" would provide more details regarding the features of the package gradually with enough detail. I assume the DFG2 benchmark may serve as that entry point and my remarks below reflect that.

Unsteady Laminar Navier Stokes - DFG2 benchmark — pyforce 0.1.0 documentation

Generation of Snapshots

Assembling variational forms

Setting up the tools to compute drag and lift coefficient:

params['rhoLU2'] = 0.1
get_drag_lift = drag_lift(mesh, ft, params, bound_markers['obstacle'])

Solving the time-dependent problem

Comparison with benchmark data:

Offline Phase

Importing Snapshots:

...
AssertionError: The input function dofs_fun has 2385, instead of 2356

The snapshots I generated locally work just fine, though.

POD algorithm:

[POD(train_snaps[field_i], var_names[field_i], use_scipy=True, verbose = True) for field_i in range(len(var_names))]

Definition of the modes:

Computing the training error:

Generation of the maps:

from scipy.interpolate import interp1d

for field_i in range(len(var_names)):
        coeff_maps[field_i]['LinInt'] = [interp1d(train_pod_coeff[field_i]['X'].flatten(),
                                        train_pod_coeff[field_i]['Y'][:, rank].flatten(),
                                        kind = 'cubic', fill_value = 'extrapolate')
                                        for rank in range(Nmax)]

Post Process: plotting the POD modes:

Online Phase: POD-I

Importing Snapshots and offline results:

Test error:

Consider the following:

test_errors[key][field_i][2]['CoeffEstimation']

Without looking deep into the documentation, for an introductory tutorial it should be explained what are the contents of test_errors[key][field_i], what does the index 2 signify, and what does the key "CoeffEstimation" mean?

Post Process

There might be more comments to follow once I go through the whole documentation

damar-wicaksono commented 1 month ago

Update (10.09.2024)

Currently, there is no section in the documentation regarding the contribution guidelines. As this is one of the review criteria, I would ask the authors to consider adding them following the common best-practices. Per the reviewer guideline, it should include:

Steriva commented 1 month ago

Response Letter: documentation

Thank you very much for the useful comments and suggestions. Here are listed the response and the changes made to the docs, refer to #15.

For the landing page, it always brings to the intro page, instead of the index to directly connect the user to the main contents of the docs.

Introduction and installation

  1. About ROSE
    • [x] Some comments on what ROSE is have been provided
  2. Minor fixes to the text have been provided as suggested.
    • [x] As a suggestion, I think there is no need to repeat the preposition "at" multiple times in: "the package...is aimed at reducing..., at searching..., and at integrating" because they are already parallel: fixed.
    • [x] Perhaps revise "either regularised with Tikhonov or not" to "either with or without Tikhonov's regularisation": changed.
    • [x] "not only restricted in the Nuclear Engineering world.": perhaps replace it with "not restricted to the Nuclear Engineering world.": fixed.
    • [x] "the API documentation and some examples on how to use the various modules of the package.": perhaps replace on with of: fixed.
  3. How to cite:
    • [x] The sentence has been rephrased as suggested.

What is ROM

A new page related to theorical aspects of ROM/DDROM techniques has been added.

Minor fixes to the text have been provided as suggested.

Reduced Basis Methods

Minor fixes to the text have been provided as suggested.

Data-Driven ROM techniques

This section has been revised following the suggestion in #10, being it very similar to the paper description of the figure.

Other minor fixes to the text have been provided as suggested.

A new section has been added, presenting the package structure: how the classes are connected.

Installation Notes

As reported in issue #9, the versions of the different python packages are the one required by dolfinx v0.6.0: the least recent version used have been listed in pyforce/requirements.txt; moreover, the pyforce/environment.yml has been modified listing the only necessary version for dolfinx and setuptools.

  1. Instead of showing how to set up a conda environment step-by-step in this section of the documentation, the authors should directly refer to the available yaml file used to set up a conda environment just like the instruction in the README.
    • [x] Both the procedures have been provided.
  2. Also provide a note stating that currently pyforce can only be obtained by directly cloning the repository (not in PyPI or conda repository)
    • [x] Fixed.
  3. "docs/ folder can be generate with": Why such a folder need to be generated?
    • [x] This was thought for internal use, it has been removed in the new version.
  4. Regarding the documentation, perhaps list the dependencies first before giving the instruction on how to build the docs
    • [x] fixed.

Community Guidelines

The community guidelines have been added in the README.md file.


Tutorials: general comments

  1. "in the tutorials, alternatevely the data can be download from Zenodo.": typo "alternatevely": fixed.

  2. In the tutorial, the authors repeatedly use the idiom "at first" which may be inappropriate as it is used more in the context of contrasting something later on ("at first it's A, but in the end it's B"). They appear numerous times, so please double check them.: double checked in each tutorial.

This section includes also some comments reported in issue #9.

Installation comments from #9

  1. trame, ipywidgets, openpyxl and trame-vtk are actually required to execute the tutorial, but are not necessary for pyforce, they are used to to produce contour plots and import benchmark data.
    • [x] The introduction to the tutorials section in the docs has been modified to highlight that these packages are just required by the tutorials.
  2. pv.start_xvfb() is necessary when using noteboks on ssh connection
    • [x] It has been removed in the novel version of tutorial 01.
  3. It appears (also in MacOS 14.5), some plotting causes a complain that vue must be vue2 instead of vue3; this may be related to the installed version of trame.
    • [x] I think the issue is related to the version of trame and trame-vuetify, I tried on my MacOS 15.0 and no issue arises. See here for the updated section on optional packages for tutorial.
  4. The only dependency that must be satisfy should be the one related to dolfinx, the others comes are a consequence of this.
    • [x] The environment.yml and the requirements.txt files have been updated to leave as constraint only the dolfinx version.

Plotting issues on the tutorials from #9

I find that the tutorials are a bit cluttered with complicated plotting routines and they often get in the way of presenting the (expected) results. One straightforward way to declutter the tutorials is by extracting these routines into local modules that are only relevant for the tutorials. Alternatively, if some of the plotting routines are general in nature, they can be implemented as a high-level plotting functions directly in the code base (but only if they are general).

Following this comment, the code to generate the contour plots using pyvista has been moved in a separate file called contour_plotting.py in the tutorial folder, composed by different functions; on the other hand, for simpler plots using matplotlib, the code has been left in the tutorial files (making it a bit more essential). Moreover, a generic function plot_function has been added to pyforce.tools.plotting to simplify the plotting of the results for a single snapshot: this function serves as template for more complex plots regarding the comparison with DDROM and FOM.

Tutorial 1: Unsteady NS

I think this page should include explicitly the purpose of this tutorial, what is the expected outcome of carrying out this exercise. Currently it contains only the description: thank you very much for the comment, a paragraph Aim of this tutorial has been added to this page and in each notebook.


Tutorial 1: Generation of the snapshots

  1. Title fixed to Generation of the snapshots.
  2. In general, what level of familiarity the users need to have with FEniCS to carry out this tutorial? When applicable perhaps provide links to the relevant sections of documentation of FEniCS/DOLFIN: the knowledge of dolfinx to carry out this tutorial is not so advanced, but a basic knowledge of the library is required. Some additional comments related to the dolfinx documentation have been added.
  3. Typo on kinematic viscosity fixed.

Assembling variational forms

  1. Typo on EI fixed.
  2. Please add a short description on what is the module ns.py provided by the authors (I may guess that it's to solve NS equation but because it's coming in a single file it should be explained): a brief description has been added.
  3. I would avoid using to import everything from ns as it obscures what functions coming from which package/module.*: fixed.
  4. BDF2 is selected via time_adv = 'BDF2' but the series of plots below show the results of the three methods: yes, the BDF2 method is selected, the plot has been updated to avoid confusion.

Setting up the tools to compute drag and lift coefficient

  1. "The FOM will be validation against...": Perhaps "will be validated against...": fixed.
  2. "...the FEATFLOW dataset...": The dataset may need a reference or at least a link: added.
  3. "We use UFL to create the relevant integrals...": What is "UFL" in this context? I assume it's a Fenics-related term so a link would be useful.: the implementation of the integrals is done the ns.py file, comment added.
  4. Maybe I missed it, but was it explained somewhere about params['rhoLU2'] = 0.1?: added.
  5. I assume drag_lift() is coming from ns.py (which upon inspection is actually a class not a function; In Python, the standard practice is to name a class with a camel case instead of with underscores). It needs to be explained briefly what is this class doing as well as the important parameters for construction.: the CamelCase convention has been adopted and a brief description of the inputs is provided.

Solving the time-dependent problem

  1. "The velocity and the pressure are stored to be later saved in appropriate data structures.": I find "...stored to be later saved..." is a rather awkward construction here.: rephrased.
  2. store_snap = False which must be True if the users want to follow this tutorial and move on to the next steps (offline phase): sorry about this, fixed.
  3. Please state clearly in the paragraph (not in the codeblock) the assumption of the directory structure (where things should be saved): fixed.
  4. "Store QoI data": I assume QoI is quantities of interest, but this has never been referred to before: added.

Comparison with benchmark data:

  1. The line # time_adv_schemes = ['BDF2', 'CN', 'EI'] is commented out but the plot actually shows the results from the three methods.: this is related to a previous version of the notebook, adopted for comparison of the different numerical schemes. Now, the plot shows only the results of the BDF2 method.
  2. Is there a purpose of saving the figure: fig.savefig('./../BenchmarkData/LaminarNS_DFG2/FOM_bench_comparison.pdf', format='pdf', dpi=300, bbox_inches='tight') and As a suggestion: plotting specifications can get in the way of the tutorial, maybe create a helper function for plotting so users can focus on what should be plotted?: a helper function has been added and compressed to help readbility and understanding (comments on the results are provided).

Tutorial 1: Offline Phase

  1. There is no need for a list if there is only one item (POD): the notebook has been modified accordingly.
  2. Is it necessary in this part to create the mesh again with gmsh module? domain is needed in the subsequent codes, but would it be possible to hide the repetition of this process so the users can just pickup from where before?: an additional function has been implemented in ns.py to avoid the repetition of the mesh generation.
  3. State briefly that to continue with the next tutorial steps later on all important files must be saved in a specific location: added.

Importing snapshots

The code has been splitted and the in-code comments removed.

  1. Why u in var_names is different from U in stored_var_names?: it was due to a bad saving, the Zenodo snapshots have been updated to accomodate this issue.

  2. I might have done something wrong, but if I used the snapshots stored in Zenodo I obtained the following exception for this tutorial: I have found the same bug, the issue may be related to the different operating system. A disclaimer and some additional comments have been added to the notebook, addressing this bug, when creating the mesh.

POD algorithm

  1. A list will be created for each POD: 0 = p, 1 = u": What do the numbers signify? Index?: yes, they are the index which are consistent with var_names.
  2. $N_s$ has been defined.
  3. For the first time, briefly introduce the class POD(); explain it's interface especially the important inputs and expected outcome output as it is currently used here. Establish a bridge between the "theoretical" ROM and the specific constructs provided by pyforce to carry out the tasks.: a brief description has been added.

Definition of the modes

  1. The code has been divided.
  2. Programmatically, what does it mean to "define" the POD modes, explain a bit what does the method compute_basis actually do, which important inputs it takes and what outcome is expected once executed. Explain why particular arguments are used, i.e., why the particular Nmax or why normalise=True?: instead of define, compute is now used (which is more correct). More comments have been added.
  3. Explain why saving the modes are necessary in the context of the tutorial: added.
  4. State briefly the assumption of the path and filenames path+'/BasisFunctions' and path+'/BasisFunctions/basisPOD_; and why they might be important to be able to follow the tutorial: the basis functions are stored in a proper directory containing the output of the offline phase, called OfflinePhase.

Computing the training error

  1. "let us compute the training errors and the POD basis coefficients.": Are the POD basis coefficients being computed by the code block below?: the POD coefficients are computed by $L^2$ projection, a comment has been added.
  2. Briefly explain the method "train_error()of an instance ofPOD` including expected inputs and expected output.: added.
  3. "The max absolute and relative reconstruction error is compared for the different algorithms,...": Which different algorithms?: sorry for the "typo", it should be fields, fixed.
  4. I think P_N[u] is not the reconstruction operator with N basis functions but the reconstructed object; it should be P_N without the object being operated: yes, you are right, fixed.

 Generation of the maps

  1. Maybe state explicitly that these methods are coming SciPy: added.
  2. Perhaps explain or say it out loud how the snapshots supposed to be sorted (by value I assume)? One can look at the codes below and guess, but it would be nice to know in advance what should happen.: added.
  3. Is it a linear interpolation or a cubic one?: it should be linear, sorry about that, fixed.
  4. "Let us store the maps using pickle": Please state the purpose of saving these as pickle files (perhaps they are going to be used later on): added.

Post Process: plotting the POD modes

  1. There are three sets of plots; the third one is not explained. One needs to understand the local plotting function plotModes() to understand what is going on.: the code to produce the plots has been moved to an external file contour_plotting.py and imported with the function plot_modes. More comments about the plots have been added.
  2. I'm having problem initially plotting the MODs in MacOS; it complains about "libgl1-mesa-glx xvfb not available" but if I remove the line pv.start_xvfb() everything seems to be fine after installing additional packages: pip install trame ipywidgets (are these listed in the dependencies?): pv.start_xvfb() is required when executing the code on a remote server. I have removed it to avoid confusion.

Tutorial 1: Online Phase

  1. There is no need for a list if there is only one item (POD): fixed.
  2. Is it necessary in this part to create the mesh again with gmsh module? domain is needed in the subsequent codes, but would it be possible to hide the repetition of this process so the users can just pickup from where before?: as suggested in the previous tutorial, a function has been implemented in ns.py to avoid the repetition of the mesh generation.
  3. State briefly that to continue with the next tutorial steps later on all important files must be saved in a specific location: included.

Importing Snapshots and offline results:

  1. Remove commented out lines: fixed.

Test error:

  1. Nmax is now 25 instead of 30, is there any reason for that?: sorry about this, fixed.
  2. Introduce briefly the class POD_project and its role within pyforce, how can it be constructed and what the instances can do; briefly explain the corresponding method synt_test_error() (these are package features that should be highlighted): a brief description has been added.
  3. Similar comment as above but for the class PODI: a brief description has been added.
  4. Where is this computational time coming from? I think it's not very transparent especially in relation what the output of synt_test_error is?: a more in-depth description of the output of synt_test_error has been added.
  5. Without looking deep into the documentation, for an introductory tutorial it should be explained what are the contents of test_errors[key][field_i], what does the index 2 signify, and what does the key "CoeffEstimation" mean?: more details have been provided.

The matplotlib plots have been moved to the helper functions within the cells.

Post Process

  1. Nmax changes again here: this value can be changed by the user to see the effect of adding/removing basis functions.
  2. Consider splitting the large block of code with markdown cells to hold description/explanation: thank you for the comment.
  3. Consider creating a set of common plotting helper functions and put them in a local module (and documented) so plotting settings do not get in the way but interested users can look at the codes in detail if they want: according to the complexity of the plot, helper functions have been created.
  4. As before, I can only execute the last part only if I remove the line: pv.start_xvfb() (is this an important line?): as mentioned above, this is related to the execution on a remote server, I have removed it to avoid confusion.
  5. State the purpose of the directory path='./Online_results/' and Explain the flag make_video = False and what would happen if it's True; it may be obvious, but it's better be explicit (e.g., "To create a video of...change the flag below to True"): in order to have a lighter code, the creation of the video has been removed. Since, in addition to what it was implemented above, the creation of the video would require more lines and additional packages.

Thank you again for the comments. In the same development branch I plan to revise the other tutorials, trying to be consistent with the comments provided to tutorial 01.

damar-wicaksono commented 1 month ago

Hi!

Thank you very much for going through my comments and made the necessary and applicable revisions.

I went through the revised documentation and confirmed that most major points I raised above have been resolved. I really appreciate the addition of the new section on the "Theory and package structure" and with the extra figures, it provides a helpful bridge between the theory of DDROM and how things are implemented in pyforce along with its convention (adopted terms, etc.). Having this new section and the revisions made, I find following the tutorial again becomes easier and feels more consistent.

I would raise a couple of additional minor points that you might want to double check:

Landing page

Reduced Basis Methods

Data-Driven ROM Techniques

Package structure

Tutorials

Unsteady Laminar Navier Stokes - DFG2 benchmark

Generation of time-dependent snapshots

Offline Phase

Decay of the POD eigenvalues

Definition of the modes

Online Phase


Thank you very much!

Steriva commented 1 month ago

The documentation has been reviewed following your comments in #16. Here's the response letter.

Landing Page

Fixed the minor issues.

Theory

RB methods

Fixed the minor issues. Only $\psi$ is now used to indicate the basis functions for consistency.

Data-Driven ROM methods

Fixed the minor issues.

Package Structure

  1. "In the following, some figures sketching how the different classes are connected to each other during the offline and online phases.": This may not be a complete sentence because it is missing a verb, either some figures "sketch", "are sketching", or "are given" at the end of the sentence.: added
  2. I'm not so sure, but perhaps you can double check the use of repeated use of "measures" in this section, I've got the feeling they should have been "measurement(s)". If it's "measurement", then double check the term used in the class figures again.: yes, you are right, in some phrases "measurement" is more appropriate than "measures". I have changed it accordingly.
  3. "therefore, supposing to have two coupled fields to reconstruct $(\phi, \mathbf{u})$ and only local evaluations of $\phi$ are available, two different problems arise:": For direct state estimation, how does information of u is necessary to reconstruct the spatial distribution of ϕ? Because reading the current description it seems that if you want to know about ϕ, just do measurement on ϕ.: yes, this is true. When measures of certain quantity are available, the reconstruction follows algorithms with the direct state estimation approach; problems arise when measures of certain quantities are not available. I have changed the sentence accordingly. I add a small comment, hope it is more clear.
  4. "...supposing to know the characteristic (unseen) parameter...": Perhaps it could be better with "Assuming that the characteristic (unseen) parameter is known,..."

Tutorials

Fixed the minor issue.

Unsteady Laminar Navier Stokes - DFG2 benchmark

Fixed the minor issue.

01 - Generate snaps

Offline phases

  1. ...have been download from Zenodo": "downloaded" instead of "download".: fixed.
  2. "indices" instead of "indeces": fixed.
  3. "At first, the correlation matrix...": "at first" may not be appropriate here.: fixed.
  4. "Here there is the first use of the offline POD class...": It sounds rather awkward to me different than the rest of the tutorial... maybe "Below we use offline POD class..." or something like that.: fixed.
  5. "The option verbose can be actived to either show or not the progress bar.`: "activated" instead of "actived".: fixed.
  6. There's a dangling "ù" after explaining verbose.: fixed.
  7. "...using the formula of point 3.": Maybe "step 3 above" instead of "point 3".: fixed.
  8. Is it compute_modes (in the text) or compute_basis (in the codes). and Is it normalize in the textornormalise` (in the codes).: sorry for the errors, fixed.

Online phases

  1. "...acting as best-case scenario...": "as the best-case scenario".: fixed.
  2. "This produdure...": Typo.: fixed.
  3. "...this is a dictornary...": Typo.: fixed.

Please make sure that the other two tutorials are consistent with the revisions.: updates are expected to be pushed in the coming days.

Steriva commented 1 month ago

Hello, Tutorials 2 and 3 have been updated in #17, following the suggestions for Tutorial 1.

damar-wicaksono commented 2 weeks ago

Hi @Steriva,

I went through the latest changes and can confirm that the updated tutorials are consistent with the previous comments. Thank you!

Out of curiosity, what's the relation between pyforce and ROM4FOAM (I see there are many similar implemented methods albeit in a different programming language)?

Steriva commented 2 weeks ago

Hi @damar-wicaksono, thank you for all the valuable comments and feedbacks about the package :)

ROM4FOAM was the "original" code, developed during my master's thesis and before by other colleagues. It was not easy to implement everything in C++ (OpenFOAM-like) and make it able to communicate with ML libraries available for Python. Then, we switched to a Python version able to read OpenFOAM cases and interact with scikit-learn or torch. Nowadays, we use pyforce most of the time for our works.

damar-wicaksono commented 2 weeks ago

Understood and thank you for the explanation!

I consider this issue resolved.