Add HDF5 support for trajs and model_devis

zjgemi commented 2 months ago

Summary by CodeRabbit

New Features
- Introduced new optional arguments for improved data handling and multitasking capabilities.
- Added support for HDF5 formatted data in various modules.
- Enhanced flexibility in input handling for multiple data formats.
Bug Fixes
- Improved robustness in handling validation data structures.
Documentation
- Updated documentation to clarify new parameters and their intended use.

coderabbitai[bot] commented 2 months ago

Walkthrough

## Walkthrough The changes enhance argument handling, data processing capabilities, and flexibility across various modules of the `dpgen2` package. New optional parameters are introduced to functions, enabling better configuration and support for HDF5 datasets. The logic for handling valid data and model freezing is refined, and new methods are implemented to improve data writing processes. ## Changes | Files | Change Summary | |-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `dpgen2/entrypoint/args.py` | Added `use_hdf5` argument to `run_diffcsp_args`. | | `dpgen2/entrypoint/submit.py` | Introduced `RunRelaxHDF5`; updated `make_concurrent_learning_op` to include `explore_config`; restructured `workflow_concurrent_learning` for multitasking data handling. | | `dpgen2/exploration/render/traj_render.py`, `dpgen2/exploration/render/traj_render_lammps.py` | Updated `get_model_devi` and `get_confs` methods to accept `Union[List[Path], List[HDF5Dataset]]` as parameters. | | `dpgen2/exploration/selector/conf_selector.py`, `dpgen2/exploration/selector/conf_selector_frame.py` | Modified `select` method to accept `Union[List[Path], List[HDF5Dataset]]` for `trajs` and `model_devis`. | | `dpgen2/op/select_confs.py` | Updated `get_input_sign` method to accept `Artifact(Union[List[Path], HDF5Datasets])` for `trajs` and `model_devis`. | | `dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py`, `dpgen2/exploration/scheduler/scheduler.py`, `dpgen2/exploration/scheduler/stage_scheduler.py` | Updated `plan_next_iteration` method to accept `Union[List[Path], List[HDF5Dataset]]` for `trajs`. | | `pyproject.toml` | Updated `pydflow` version from `>=1.6.57` to `>=1.8.88`. | | `tests/op/test_run_relax.py` | Added empty dictionary under `"expl_config"` in the `OPIO` constructor within `testRunRelax`. |

Recent review details

**Configuration used: CodeRabbit UI** **Review profile: CHILL**

Commits

Files that changed from the base of the PR and between 687b9c574fd45840f64fc3bb22b17f32dd31948e and 0499be9f2043e732493c4d2e30e173f5892cc1d6.

Files selected for processing (4)

* dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py (2 hunks) * dpgen2/exploration/scheduler/scheduler.py (3 hunks) * dpgen2/exploration/scheduler/stage_scheduler.py (3 hunks) * dpgen2/flow/dpgen_loop.py (3 hunks)

Additional comments not posted (7)

dpgen2/exploration/scheduler/stage_scheduler.py (2)
`11-12`: **Approved import changes.** The addition of `Union` and `HDF5Dataset` is necessary for the new functionality to handle both paths and HDF5 datasets in the `trajs` parameter. Also applies to: 14-15 --- Line range hint `95-106`: **Approved method changes with a suggestion to verify integration.** The update to the `trajs` parameter type in `plan_next_iteration` enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity. Please ensure that the integration of `HDF5Dataset` is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios.

dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py (2)
`8-8`: **Approved import changes.** The addition of `Union` and `HDF5Dataset` is necessary for the new functionality to handle both paths and HDF5 datasets in the `trajs` parameter. Also applies to: 14-16 --- `74-74`: **Approved method changes with a suggestion to verify integration.** The update to the `trajs` parameter type in `plan_next_iteration` enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity. Please ensure that the integration of `HDF5Dataset` is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios.

dpgen2/exploration/scheduler/scheduler.py (2)
`8-8`: **Approved import changes.** The addition of `Union` and `HDF5Dataset` is necessary for the new functionality to handle both paths and HDF5 datasets in the `trajs` parameter. Also applies to: 15-17 --- `117-117`: **Approved method changes with a suggestion to verify integration.** The update to the `trajs` parameter type in `plan_next_iteration` enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity. Please ensure that the integration of `HDF5Dataset` is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios. Also applies to: 126-126

dpgen2/flow/dpgen_loop.py (1)
`96-96`: **Type annotation change approved, verify handling of new type.** The change in type annotation for the `trajs` key in the `get_input_sign` method is approved as it enhances flexibility. However, it's crucial to verify that the rest of the codebase correctly handles the new `HDF5Datasets` type to ensure compatibility. Run the following script to verify the handling of the new type:

Verification successful
--- **Handling of `HDF5Datasets` type is consistent across the codebase.** The codebase shows consistent handling of the `trajs` variable with the new type annotation `Union[List[Path], HDF5Datasets]`. The changes are integrated correctly, and the code is prepared to handle both types effectively. - Instances in `dpgen2/exploration/selector/conf_selector_frame.py` and other files confirm the correct handling of the new type.

Scripts executed
The following scripts were executed for the analysis: Script: ```shell #!/bin/bash # Description: Verify handling of the new HDF5Datasets type in the codebase. # Test: Search for usage of `trajs` and check if it handles both types correctly. Expect: Proper handling of both List[Path] and HDF5Datasets. rg --type python -A 5 $'trajs' ``` Length of output: 60 --- Script: ```shell #!/bin/bash # Description: Verify handling of the new HDF5Datasets type in the codebase. # Test: Search for usage of `trajs` and check if it handles both types correctly. Expect: Proper handling of both List[Path] and HDF5Datasets. rg --type py -A 5 $'trajs' ``` Length of output: 47826

--- Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips

### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.

Project coverage is 83.65%. Comparing base (ce4ab3e) to head (0499be9). Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
dpgen2/op/run_relax.py	71.42%	6 Missing :warning:
dpgen2/exploration/render/traj_render_lammps.py	77.77%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #259 +/- ## ========================================== - Coverage 83.70% 83.65% -0.05% ========================================== Files 104 104 Lines 5958 5990 +32 ========================================== + Hits 4987 5011 +24 - Misses 971 979 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

zjgemi commented 2 months ago

Could you please also support run_lmp, which seems to be straightforward.

Sure.

zjgemi commented 2 months ago

Could you please also support run_lmp, which seems to be straightforward.

I realize that for run_lmp, a task only outputs a single trajectory and a single model_devi file. As outputs of each task must be stored in a seperated file. Merging outputs of each task into a HDF5 file will bring little benefit.

On the other hand, in the HDF5 mode, users cannot conveniently preview file content in UI. That's why HDF5 mode is not employed by default unless performance bottleneck is met.

deepmodeling / dpgen2

Add HDF5 support for trajs and model_devis #259

Summary by CodeRabbit

Codecov Report