-
```
"WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.0+cu118 with CUDA 1106 (you have 2.1.0+cu121)
Python 3.9.16 (you have 3.10.12)
Please reinstall…
-
### 🐛 Describe the bug
```
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc…
-
### System Info
```shell
HL-SMI Version: hl-1.17.0-fw-51.3.0
Driver Version: 1.17.0-28a11ca
Docker image: vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.…
-
### 🐛 Describe the bug
I’ve noticed large “spikes” in memory usage at the start of epochs when using IterDataPipes with attributes that take a lot of memory. These can cause my training jobs to fail …
-
### 🐛 Describe the bug
File "/home/a/anaconda3/envs/shiyan/lib/python3.9/site-packages/dgl/graphbolt/base.py", line 8, in
from torchdata.datapipes.iter import IterDataPipe
File "/home/a/ana…
-
Hi there,
I am trying to run this script on Googles Colab. I have encountered import error showing: "ModuleNotFoundError: No module named 'torchtext.legacy'" when the code attempted to import the Ba…
-
### 🐛 Describe the bug
new_perf_regression in 2024-02-11
| suite | name | batch_size_new | speed_up_new | inductor_new | eager_new | compilation_latency_new | batch_size_old | speed_up_old | induc…
-
### 🐛 Describe the bug
Here is the code triggering this issue:
```
input = torch.tensor([-9223372036854775808], dtype=torch.int64)
other = torch.tensor(215, dtype=torch.uint8)
out = torch.lcm(i…
-
I've been trying to debug and resolve a number of distributed training shuffle issues recently, I've found some alarming issues...
1. There is no way to have a reliable epoch count based determinis…
-
Hi,
my env as belows :
docker image : docker run --gpus all -it --net=host --ipc=host --ulimit memlock=-1 -v /home/ubuntu/test:/home/finetune -v /ssd/gyou:/models --name=vicuna nvcr.io/nvidia/pytor…