-
Hello team, I'm new to PyCylon and have an issue related to distributed sorting.
It seems that an empty dataframe in one process triggers an Exception when I perform sort_values in distributed way.…
-
Hi,
I was trying to run llama2 in my local computer (Windows 10, 64 GB RAM, GPU 0 Intel(R) Iris (R) Xe Graphics). Got following error -
1. raise RuntimeError("Distributed package doesn't have N…
-
[//]: # "SPDX-FileCopyrightText: Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved."
[//]: # "SPDX-License-Identifier: Apache-2.0"
[//]: # ""
[//]: # "Licensed under the …
-
**Version of orchestrion**
[v0.9.3](https://github.com/DataDog/orchestrion/releases/tag/v0.9.3)
**Describe what happened:**
I'd like to share some observations following the recent PoC, where we a…
-
Thank you for your excellent work! I have some trouble with training:
I tried to install slurm for cluster job scheduling, but unfortunately many attempts failed. So, what we want to know is if ther…
-
I try to run the command for training Deformable DETR on one node with 8 GPUs is as following:
```bash
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh
```
It works.…
-
Thanks for your great work. I'd like to reproduce the training process, but I encountered an error. That is when I use multi-GPU distributed training process, the logging information seems normal, but…
-
### 📚 The doc issue
[here](https://github.com/pytorch/pytorch/blob/ce503c1b40207dab770c28cbd4568cd9e105277b/torch/distributed/distributed_c10d.py#L2067) the doc string says the function will return…
botbw updated
6 months ago
-
hello, when i set 2 to sampler_per_gpu in /projects/configs/surroundocc/surroundocc.py,the problem is shown as follows:
RuntimeError: stack expects each tensor to be equal size, but got [62812, 4] at…
-
## Description
Having `phpunit/phpunit` as a Composer dev dependency is very obnoxious, because:
- its deps may conflict with ours
- everything gets dumped into the same vendor folder inseparab…