-
### 🐛 Describe the bug
To reproduce, install fairscale + pt from source and run this test:
```
python -m pytest tests/nn/data_parallel/test_fsdp_with_checkpoint_wrapper.py::test_train_and_eval_w…
-
**场景**:使用BGE-M3进行finetune,数据文件.jsonl 含有158000行记录,每行记录一个query,pos列表的长度为1,neg列表的长度为15。
**异常报错**:
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS envi…
-
## 🐛 Bug
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
root@t1v-n-108b165f-w-0:/workspace# /usr/local…
-
After prepare the training env , I try to finetune the model as following the step2 and step3 in
https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives
step2 is d…
-
### System Info
```Shell
accelerate==0.34.0
```
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] One of the scripts in the examples/ folder of Acc…
-
```py
import torch
from torch import Tensor # E: invalid syntax [syntax]
@torch.library.custom_op("mylib::foo", mutates_args={"x"})
def foo(x: Tensor) -> None:
x.sin_()
@torch.compile(f…
-
## 🚀 Feature
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html
https://aws.amazon.com/machine-learning/neuron/
### Motivation
https://aws.amazon.com/about-aws/whats-new/2022…
-
**Describe the bug**
I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma …
-
## Environment
- OS: [Ubuntu 20.04]
- Hardware (GPU, or instance type): [H100x16]
## To reproduce
Steps to reproduce the behavior:
1. [Use this dataset class](https://github.com/mos…
-
### System Info
databricks
### Who can help?
@ArthurZucker @younesbelkada
Hi team,
I got an error message by using TorchDistributor.
I have checked in the class BertEmbeddings (u…