fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

volcengine/verl #21

Basic Tutorial: Adding a New LLM Inference/Serving Backend

1. **Prerequisite:** Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by `torchrun --standalone --nproc=8 offline_i…

PeterSH6 updated 2 days ago
1
ROCm/TransformerEngine #78

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dum…

### Problem Description On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…

OrenLeung updated 1 week ago
24
axolotl-ai-cloud/axolotl #1947

Llama will not save properly

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

mfirth-truffle updated 1 month ago
3
meta-llama/llama-recipes #675

llama finetune.py throws pytorch tensor datatype error with …

### System Info PyTorch 2.4.0, Cuda 12.1, CentOS HPC cluster with 7x H100 GPUs ### Information - [X] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug ```bash FSD…

AAndersn updated 1 week ago
9
pytorch/ao #1086

Torchao does not work with HSDP

``` [rank0]: File "/opt/venv/lib/python3.10/site-packages/torch/distributed/_composable/fsdp/_fsdp_param.py", line 653, in all_gather_inputs [rank0]: ) = sharded_local_tensor.fsdp_pre_all_gath…

goldhuang updated 1 month ago
3
axolotl-ai-cloud/axolotl #1888

Training with a large json dataset (>650K) throw error:pyarr…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ###…

bofei5675 updated 2 weeks ago
1
huggingface/transformers #34207

MLlama does not work with FSDP mix-precision

### System Info transformers==4.45.2 ### Who can help? @ArthurZucker ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported tas…

YangFei1990 updated 1 week ago
7
aws-samples/awsome-distributed-training #467

FSDP sample fails with CUDA initialization error on HyperPod…

When running the [FSDP sample app](https://github.com/aws-samples/awsome-distributed-training/blob/main/3.test_cases/10.FSDP/README-EKS.md) on HyperPod EKS cluster, I got this error. ``` [W CUDAFu…

shimomut updated 3 weeks ago
6
evo-design/evo #85

frameworks for FSDP and model/pipeline parallelism

Hello, Along the issue here https://github.com/evo-design/evo/issues/11 which discusses finetuning codes for Evo, I am specifically looking for information on which frameworks could be used to opti…

adrienchaton updated 1 month ago
14
linkedin/Liger-Kernel #48

Unable to use FLCE with FSDP+PEFT+embeddings layers

### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable (my assumption is it also ap…

winglian updated 1 month ago
5

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for fsdp

1000+ results
for fsdp