feat(pt): allow PT OP CXXABI different from TF

njzjz commented 2 weeks ago

Build PT OP libraries with compatible CXXABI if PT has a different CXX ABI with TF;
Enable PT OP in test_cuda workflow;
Update documentation.

Summary by CodeRabbit

Documentation
- Removed outdated instructions related to setting environment variables for enabling customized C++ OPs in PyTorch.
Chores
- Updated build configuration to handle PyTorch CXX11 ABI compatibility with TensorFlow.
- Refactored library creation processes for better handling of CUDA and ROCm toolkits.
- Improved build scripts to dynamically adjust compile definitions and installation paths based on different build configurations.
CI/CD
- Enhanced the continuous integration workflow to include PyTorch variable assignments and settings for testing with CUDA.

coderabbitai[bot] commented 2 weeks ago

Walkthrough

## Walkthrough This update enhances the build process for managing PyTorch and TensorFlow compatibility by introducing checks and appropriate settings for the CXX11 ABI flag, refactoring GPU library creation, and adding conditional linking. These modifications support customized C++ operations without requiring explicit environment variables, ensuring seamless integration and compatibility within different build configurations. ## Changes | File | Change Summary | |----------------------------------|----------------------------------------------------------------------------------------------------------| | `doc/install/install-from-source.md` | Removed instructions and explanations related to the `DP_ENABLE_PYTORCH` environment variable and CXX11 ABI flags. | | `source/CMakeLists.txt` | Handled PyTorch CXX11 ABI compatibility checks with TensorFlow, setting compatibility flags accordingly. | | `source/api_cc/CMakeLists.txt` | Linked libraries conditionally based on ABI flag comparisons to adjust build configurations for libraries. | | `source/lib/CMakeLists.txt` | Introduced a `create_library` function for dynamic library creation with suffixes based on CUDA/ROCm toolkits. | | `source/lib/src/gpu/CMakeLists.txt` | Refactored GPU library creation and linking process, removed unused compile definitions, and handled ABI compatibility. | | `source/op/pt/CMakeLists.txt` | Implemented conditional linking and adjusted compile definitions based on PyTorch ABI settings. | | `.github/workflows/test_cuda.yml` | Added environment variable exports and assignments to streamline testing configurations. |

Recent review details

**Configuration used: CodeRabbit UI** **Review profile: CHILL**

Commits

Files that changed from the base of the PR and between c85b559b5bf047aa97f9b0fba613eed1e55b7a5c and ee0f409f13e03b96adc98500431b4cbf00243ad8.

Files selected for processing (2)

* .github/workflows/test_cuda.yml (1 hunks) * doc/install/install-from-source.md (1 hunks)

Additional context used

LanguageTool

doc/install/install-from-source.md
[style] ~89-~89: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing. (REP_WANT_TO_VB) Context: ...e $deepmd_venv/bin/activate ``` if one wants to skip out of the virtual environment, he... --- [uncategorized] ~146-~146: Did you mean: “By default,”? (BY_DEFAULT_COMMA) Context: ... The path to TensorFlow Python library. By default the installer only finds TensorFlow und... --- [uncategorized] ~190-~190: “your” (belonging to you) seems less likely than “you”. (AI_HYDRA_LEO_CP_YOUR_YOU) Context: ...nccl pip install horovod mpi4py ``` If your work in a CPU environment, please prepa... --- [style] ~295-~295: Consider using “incompatible” to avoid wordiness. (NOT_ABLE_PREMIUM) Context: ...rch/pytorch/issues/51039), which may be not compatible with other libraries. ```bash cmake -D...

Markdownlint

doc/install/install-from-source.md
335-335: null (MD014, commands-show-output) Dollar signs used before commands without showing output --- 336-336: null (MD014, commands-show-output) Dollar signs used before commands without showing output --- 107-107: null (MD040, fenced-code-language) Fenced code blocks should have a language specified --- 153-153: null (MD040, fenced-code-language) Fenced code blocks should have a language specified

Additional comments not posted (1)

.github/workflows/test_cuda.yml (1)
`52-58`: **Ensure correct environment variable assignment and usage.** The environment variables `PYTORCH_ROOT` and `TENSORFLOW_ROOT` are being set using Python commands, which is a robust way to dynamically determine their paths. However, the `DP_ENABLE_PYTORCH` variable is set to `1` without conditional checks, which might not be suitable for all build configurations or might conflict with other settings. Consider adding conditions or configurations to enable/disable this based on specific needs.

Verification successful
The search output confirms that `DP_ENABLE_PYTORCH` is set to `1` without any conditional logic in the `.github/workflows/test_cuda.yml` file. This suggests that the variable is always enabled in this workflow. To ensure that this is appropriate for all configurations, we should inspect the workflow file for any other relevant configurations or steps that might influence the necessity of `DP_ENABLE_PYTORCH`. Let's search for other environment variables or steps in the workflow that might provide context on whether `DP_ENABLE_PYTORCH` should always be enabled or if there are scenarios where it should be conditional. --- To understand the context and necessity of `DP_ENABLE_PYTORCH`, we need to extract and analyze the sections of the workflow file containing environment variables (`env:`) and steps (`steps:`). --- The extracted sections from the workflow file show that `DP_ENABLE_PYTORCH` is set to `1` without any conditional logic. Additionally, the presence of other environment variables and steps does not indicate any scenarios where `DP_ENABLE_PYTORCH` should be disabled or conditionally set. Given this, it appears that `DP_ENABLE_PYTORCH` is intended to be enabled for all configurations within this workflow. Therefore, the current setup seems appropriate, and no conditional logic is necessary for `DP_ENABLE_PYTORCH`. - `.github/workflows/test_cuda.yml`: `DP_ENABLE_PYTORCH` is set to `1` unconditionally.

Scripts executed
The following scripts were executed for the analysis: --- Script: ```shell #!/bin/bash # Description: Ensure that DP_ENABLE_PYTORCH is set correctly based on the build configurations. # Test: Search for conditional logic related to DP_ENABLE_PYTORCH in the workflow file. Expect: Conditional logic that sets DP_ENABLE_PYTORCH based on certain parameters. rg --type yaml 'DP_ENABLE_PYTORCH' .github/workflows/test_cuda.yml ``` Length of output: 95 --- --- Script: ```shell #!/bin/bash # Description: Search for environment variables and steps in the workflow file to understand the context of DP_ENABLE_PYTORCH. # Test: Look for other environment variables and steps in the workflow file that might influence the necessity of DP_ENABLE_PYTORCH. rg --type yaml 'env:' .github/workflows/test_cuda.yml rg --type yaml 'steps:' .github/workflows/test_cuda.yml ``` Length of output: 175 --- --- Script: ```shell #!/bin/bash # Description: Extract sections containing environment variables and steps for detailed inspection. # Test: Use rg to extract the sections containing env: and steps: from the workflow file. rg --type yaml -A 5 'env:' .github/workflows/test_cuda.yml rg --type yaml -A 5 'steps:' .github/workflows/test_cuda.yml ``` Length of output: 1340

--- Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips

### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

njzjz commented 1 week ago

It seems to me that we can still not enable PyTorch by default, as the default PyTorch on the PyPI requires NVCC...

CUDA_TOOLKIT_ROOT_DIR not found or specified
CMake Warning at /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:1[85](https://github.com/deepmodeling/deepmd-kit/actions/runs/9605428597/job/26493048913?pr=3891#step:11:86) (find_package)

CMake Error at /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:91 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:185 (find_package)

xref: https://github.com/pytorch/pytorch/issues/78530

deepmodeling / deepmd-kit

feat(pt): allow PT OP CXXABI different from TF #3891

Summary by CodeRabbit