-
Hi, developers, I found the variable `device_dir` is hard coded to `/dev/dri/by-path/` (see code [here](https://github.com/oneapi-src/oneCCL/blob/master/src/common/global/ze/ze_fd_manager.cpp#L151)), …
-
I started playing with allreduce example from the main repository https://github.com/oneapi-src/oneCCL/blob/master/examples/cpu/cpu_allreduce_test.cpp .
I modified it slightly by increasing the buf…
-
Define processes for decision making within the oneTBB project. This should include a RFC process for new design and feature proposals
(see oneDNN for example - PR tagged RFC that links to the proposa…
-
IPEX has restriction on transformers version, but llm-on-ray doesn't have. To verify IPEX and other llm-on-ray functions in parallel in CI, we can add a new ipex extra in pyproject.toml with right tra…
-
Please see the example
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/examples.html#example-1-single-process-single-thread-multiple-devices
Thanks.
-
Please see the example https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/examples.html#example-1-single-process-single-thread-multiple-devices
-
**Describe the bug**
I have two ubuntu machines, and with 10Gb/s erthnet cable connected and I want to use deepspeed to use these two machines to
run a model training with pipeline parallel, and …
-
Currently we can initialize multiple XGBoost Rabit instances from same process but from different thread. In Spark, its possible to have multiple tasks run on same executor. A executor is single JVM p…
-
I tried to create the serving on my system, but failed with the below error:
(emon_analyzer) [root@SPR-1 emon_data_analyzer]# neuralchat_server start --config_file ./config/neuralchat.yaml
2024-03-1…
-
### Describe the bug
Communication and computation do not appear to overlap when launching kernels in different `xpu.Stream`s (on Intel GPU Max 1550s). Being able to overlap communication and commun…