-
Hi! when I try to run a python [scripts](https://github.com/pytorch/PiPPy/blob/main/examples/llama/pippy_llama.py) for llm inference in pipeline parallelism on single server with multi GPUs. It turned…
-
**Describe the bug**
I tried multiple workflows and run into different issues when I run on multi-GPU setup running NVTabular workflows on large datasets.
Error 1: Workers just die one after one
…
-
Hi
I am using `MultiWorkerMirrorStrategy` and `tf.estimator.train_and_evaluate` for distributed training with 3 epoch.
Please find below the information:
```
GPU: 4 x NVIDIA Tesla V100
Datasets:…
-
# ARTIQ Feature Request: Applets run with the master and are universal to different dashboards
## Problem: Currently, applets are saved separately for different dashboards
We are using the headles…
-
Edit: @robvanvolt is right
WebDataset is perfect for us - any dataset already in the format expect by the `TextImageDataset` we have now can easily be converted to a WebDataset by placing them in ~…
-
S = 10 ends within 1 minute, but with 100 it doesn't seem to finish (for over an hour) in `result_tn_100
-
Hi
I need to make iterative datasets work with distributed training, for this I shard the data which does not work, see my issue here https://github.com/pytorch/xla/issues/2657 to pytorch XLA team bu…
-
Hello, I don't know what I'm doing wrong. I received the following error as indicated in the title.
My input was as shown on this website: :
[Hugging Face - Ger-RAG-eval](https://huggingface.co/da…
-
(glp) ning@ubuntu:~/GLPDepth/datasets$ python ../code/utils/extract_official_train_test_set_from_mat.py nyu_depth_v2_labeled.mat splits.mat ./nyu_depth_v2/official_splits/
Traceback (most recent call…
-
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
in
----> 1 from sentence_transformers impor…