fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #73885

operation not supported crash when initializing RPC tensorpi…

### 🐛 Describe the bug To reproduce, install fairscale + pt from source and run this test: ``` python -m pytest tests/nn/data_parallel/test_fsdp_with_checkpoint_wrapper.py::test_train_and_eval_w…

rohan-varma updated 2 years ago
1
FlagOpen/FlagEmbedding #955

关于BGE-M3在微调时报：pyarrow.lib.ArrowInvalid: offset overflow whil…

**场景**：使用BGE-M3进行finetune，数据文件.jsonl 含有158000行记录，每行记录一个query，pos列表的长度为1，neg列表的长度为15。 **异常报错**： WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS envi…

MarcusEddie updated 3 months ago
1
pytorch/xla #6620

SIGSEGV: Segmentation Fault Memory error while checkpointing…

## 🐛 Bug concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. root@t1v-n-108b165f-w-0:/workspace# /usr/local…

shub-kris updated 4 months ago
11
FlagOpen/FlagEmbedding #139

how to fine tune the bge model by using single GPU?? how to …

After prepare the training env , I try to finetune the model as following the step2 and step3 in https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune#hard-negatives step2 is d…

Yazooliu updated 1 year ago
11
huggingface/accelerate #3072

Setting FSDP FULL_STATE_DICT explicitly doesn't work

### System Info ```Shell accelerate==0.34.0 ``` ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] One of the scripts in the examples/ folder of Acc…

thepowerfuldeez updated 1 month ago
3
pytorch/pytorch #131192

custom ops don't reinplace when mutated arg is a view of a g…

```py import torch from torch import Tensor # E: invalid syntax [syntax] @torch.library.custom_op("mylib::foo", mutates_args={"x"}) def foo(x: Tensor) -> None: x.sin_() @torch.compile(f…

zou3519 updated 1 month ago
9
Lightning-AI/pytorch-lightning #15172

AWS Neuron support

## 🚀 Feature https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html https://aws.amazon.com/machine-learning/neuron/ ### Motivation https://aws.amazon.com/about-aws/whats-new/2022…

carmocca updated 9 months ago
10
microsoft/DeepSpeed #5898

[BUG] Gradient accumulation causing training loss difference…

**Describe the bug** I am trying to pretrain an [Olmo ](https://github.com/allenai/OLMo)1B model on 8 MI 250 GPUs with Docker image: rocm/pytorch:latest (ROCm 6.1). I'm using a small subset of Dolma …

gramesh-amd updated 2 weeks ago
3
mosaicml/streaming #758

Memory leak using download_file with DDP or FSDP

## Environment - OS: [Ubuntu 20.04] - Hardware (GPU, or instance type): [H100x16] ## To reproduce Steps to reproduce the behavior: 1. [Use this dataset class](https://github.com/mos…

nagadit updated 1 month ago
9
huggingface/transformers #25130

an inplace operation preventing TorchDistributor training

### System Info databricks ### Who can help? @ArthurZucker @younesbelkada Hi team, I got an error message by using TorchDistributor. I have checked in the class BertEmbeddings (u…

liqi6811 updated 9 months ago
6

上一页 1...82 83 84 85 86 87 88...100 下一页

1000+ results for fsdp

1000+ results
for fsdp