-
### Describe the bug
I am trying to use the sidecar for building in binaries to the app but get this error everytime
```bash
thread 'tokio-runtime-worker' panicked at 'Failed to spawn command: Io(O…
-
### Issue Content:
**Description:**
It usually happens during lora training for some time.
I encountered a `subprocess.CalledProcessError` when running the `train_network.py` script using the …
-
I'm trying to run Advanced Networking Benchmark in doca container but I use dpdk config for GPUDirect after setting batch size 1 the below errors appeared:
`[critical] [adv_network_dpdk_mgr.cpp:151…
-
[[Open issues - help wanted!]](https://github.com/vllm-project/vllm/issues/4194#issuecomment-2102487467)
**Update [9/8] - We have finished majority of the refactoring and made extensive progress fo…
-
**What would you like to be added**:
Currently, the `karmada-scheduler` only supports adding custom plugins in the first two scheduling stages (`FilterClusters`, `ScoreCluster`). I hope that custom p…
-
### Apache Airflow version
Other Airflow 2 version
### What happened
When I am running multiple schedulers (>1) the statsd exporter does not correctly sum the running tasks from the different sched…
vDMG updated
2 months ago
-
When training batch size 4 on H100 the speed is 1.27 second / it
When training batch size 4 on 2x H100 the speed is 2.05 second / it
So basically we almost got no speed boost from multiple GPU t…
-
### Describe the bug
With a video splashscreen, in a release bundle, when closing the splashscreen window, the application crashes with a segmentation fault (not everytime)
That problem does not…
-
[README.md](https://github.com/ros-realtime/reference-system) suggests using `isolcpus=2,3` and gives an example of how to run a single-threaded executor on CPU2 by `taskset -c 2`.
But there is a …
-
Hi,
I am trying to train models with ps-lite. It works well in multi_thread mode like test_kv_app_multi_workers, but in multi_process model, only one worker process works and the others are blocked i…