LLNL / libROM

Data-driven model reduction library with an emphasis on large scale parallelism and linear subspace methods
https://www.librom.net
Other
208 stars 36 forks source link

Removal of time interval and closing files within BasisWriter::writeBasis #261

Closed dreamer2368 closed 9 months ago

dreamer2368 commented 10 months ago

Closing files at the end of BasisWriter::writeBasis

This PR revamps #150 . Now closing the database file is not associated with the destruction of BasisWriter. This enables writing both the basis and the snapshot at a given moment. As far as I believe, this should keep the backward compatibility as well, but it needs other people's review as we do not have a regression test to check this.

Removal of time interval

The concept of time interval is obsolete, and it hasn't been used in practice. This is removed, addressing #267 . While the inside of all functions do not use the concept of time intervals, some function signatures remain in order to keep the backward compatibility:

The concept of time interval is obsolete, and it hasn't been used in practice. This will be removed in future (#267 ), and for now we enforce the number of interval not to be more than 1.

dreamer2368 commented 9 months ago

@dylan-copeland , After looking throughout the code, I'd like to suggest that we work on a separate PR for removing the concept of time intervals.

Even though the multiple time intervals are never used in practice, num_time_intervals or d_num_intervals_written are widely used in many classes and dataset names in hdf5 format. Since this concept of time interval has been so engrained into the code, it will take quite some time to remove it while keeping the consistency of the code. We will also have to discuss how we change the format of dataset name, which will have quite a big impact in backward compatibility.

dreamer2368 commented 9 months ago

@dylan-copeland , per our discussion, I added a commit that spits an error message whenever a user attempts to increase time interval, directly or indirectly through BasisGenerator or SVD class. Technically it still returns an error without this message, but this message will guide the user to increase the maximum number of samples.

Now unit_tests/test_SVD.cpp is removed from ci workflow. This test checks nothing but the concept of time interval, which is obsolete from now on.

I thought about the data format of basis classes, and ended up keeping the current implementation, allowing both csv and hdf5 format. Currently both data formats are compatible, and no reason to remove a functionality that we can support without any complication.

With that, I think this PR is ready for review and merge.

dreamer2368 commented 9 months ago

@dylan-copeland ,

In my work for PR #274 , I ended up removing the concept of time interval. I moved those changes to here since it fits more to this PR.

While the inside of all functions do not use the concept of time intervals, some function signatures remain in order to keep the backward compatibility:

Not exactly sure if we'd want to change these function signatures right away, which will impact all applications using libROM. Need more discussion about this @chldkdtn @ckendrick

dreamer2368 commented 9 months ago

Dataset names for basis are now updated, removing the trailing 6 digits of integer that used to indicate time interval.

Any out-dated snapshot/basis hdf5 files can be updated using scripts/data/update_basis_formats.py. The syntax for using this python script is:

python3 update_basis_format.py filename

This basis/snapshot format update is tested with unit_tests/test_basis_conversion.cpp. In .github/workflows/run_tests/action.yml, outdated basis/snapshot files are first converted with the python script, and checked if the BasisReader class can read the updated files.

dreamer2368 commented 9 months ago

@ebchin ,

After some discussion throughout this PR, we decided to change the dataset names within snapshot/basis hdf5 files. I heard from @dylan-copeland that you might have data files for your production and your simulation can be impacted by this update. I prepared a python script scripts/data/update_basis_formats.py, which can update the dataset names of the out-dated hdf5 snapshot/basis files. This script is also tested in the ci workflow. Please feel free to bring up any suggestion/concerns.