-
Hi, I'm trying to run gpt-neox on LUMI HPC.
But I'm saddly getting errors that look like this:
```GPU core dump failed
GPU core dump failed
Memory access fault by GPU node-9 (Agent handle: 0x7d5f9…
-
Merely *importing* StatefulDataLoader from the nightly [`torchdata`](https://github.com/pytorch/data/tree/main) package (i.e. putting the line `from torchdata.stateful_dataloader import StatefulDataLo…
-
### 💡 Your Question
```
from super_gradients.training import Trainer
from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.traini…
-
https://github.com/vanderschaarlab/synthcity/blob/7c1d9b5d3397334ce8d79ffd48ac31449d72b730/setup.cfg#L36
Hello synthcity Team,
Are there any plans to update the library file from ">=1.10.0,
-
### 🐛 Describe the bug
### Description
In `torchdata.stateful_dataloader.sampler.py`, several Sampler classes in `torch.utils.data` are overwritten:
1. https://github.com/pytorch/data/blob/main/t…
-
### 🐛 Describe the bug
When training a new Model [PHI 1.5](https://huggingface.co/microsoft/phi-1_5) with Transformers via accelerate/axolotl, I get the following error
`No backend type associated…
-
### 🐛 Describe the bug
drq performance failure in 2024-09-30 nightly release
```
loading model: 0it [00:00, ?it/s]cpu eval drq
ERROR:common:Backend dynamo faile…
-
I am trying to use YALTAi with colab. Code is here:
https://github.com/gabays/CHR_2023/blob/main/CHR_digital_diplomacy.ipynb
When doing `pip install YALTAi` I get the following error:
```
ER…
-
## Proposed refactor
Getting this:
```
UserWarning: You're resuming from a checkpoint that ended mid-epoch. Training will start from the beginning of the next epoch. This can cause unreliable res…
-
I wanted to train this model in a single GPU (Google Colab), but I've gotten this error! while I settled the --gpu_num 1 and also I changed the number of workers to either 0 and 1 , but it didn't wor…