-
# GPU POST Phase III features
1. Add option to configure the scyrpt algorithm to use blake3 instead of sha3. (Phase II only supports sha3). Keep feature to use sha3.
1. Support these output sizes: 1…
avive updated
4 years ago
-
When the total number of GPUs is large and the message size is 1, if the NVLSTree algorithm is specified, the execution time for Allreduce in NCCL can be as high as 300ms. However, if the Ring algorit…
-
Currently photo module have plenty of tonemapping and HDR algorithms but there is no GPU implementation. These algorithms are highly parallelizable and CUDA implementation can be much more faster than…
-
## Description
I tired running the bertQA sample in Jetson Orin nano with jetpack 6.1
I used Bert Base, because Bert Large kills itself when building the engine(may be because of memory issue).
```
[…
-
### 🔎 Search before asking
- [X] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report.
- [X] I have searched the PaddleOCR [Issues](https://…
-
**Describe the bug**
您好,
我使用KubeFate [docker-compose-release](https://github.com/FederatedAI/KubeFATE/releases/download/v2.0.0/kubefate-docker-compose-v2.0.0.tar.gz) 中的方法在三个机器上部署Fate training和se…
-
### Discussed in https://github.com/orgs/mfem/discussions/4488
Originally posted by **CINTROINI** September 5, 2024
Dear MFEM community,
We are developing a new code based on MFEM to simu…
-
I'm running nccl-test `all-reduce` between two nodes, and I've found that the tree algorithm performs much better than the ring algorithm. However, through reading the NCCL source code, I noticed tha…
-
Hi
I hope you’re doing well! I’m currently working with the MAPPO implementation in your repository and have a question regarding GPU configuration for optimal training performance.
Current Setu…
-
Add real-esrgan gpu-based algorithm with pretrained model