fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ParticleMedia/RAGTruth #9

[BUG] ValueError(f"Could not find the transformer layer clas…

Objective: To train and evaluate a model on RAGTruth dataset Settings: OS: Ubuntu WSL Python: 3.12.4 NVIDIA Driver Version: 536.23 CUDA Version: 12.2 Replication steps: 1. Git clone 2. Run…

Vanessa-Taing updated 2 weeks ago
5
lm-sys/FastChat #1437

在哪里设置

OSError: Unable to load weights from pytorch checkpoint file for 'llama-7b-hf\pytorch_model-00002-of-00002.bin' at 'llama-7b-hf\pytorch_model-00002-of-00002.bin'. If you tried to load a PyTorch model …

HeroBarry updated 11 months ago
7
mosaicml/composer #3493

FSDP Wrapping Alters Optimizer's Parameter Tracking Behavior

Hi! There appears to be an inconsistency in the behavior of the optimizer before and after wrapping with Fully Sharded Data Parallel (FSDP). When FSDP wraps the optimizer, it seems to modify the s…

DavidBert updated 2 months ago
8
kohya-ss/sd-scripts #583

how to use default configuration

I would like to train a model using two or more machines. After setting up the default configuration file using accelerate config, it seems that when I call train_db.py, it is not actually using the c…

hckj588ku updated 1 year ago
2
pytorch/torchtune #1407

[Feature request] Add grad norm monitoring/logging

Personally I have found that monitoring grad norm is useful to understand stability of training. It is also useful to set an appropriate clipping value (though I don't think torchtune supports grad no…

gau-nernst updated 2 weeks ago
6
huggingface/diffusers #2606

get stuck when save_state using DeepSpeed backend under trai…

### Describe the bug When using DeepSpeed backend, training is ok but get stuck in `accelerator.save_state(save_path)`. If use MULTI_GPU, the process is OK. The training script is ``` accele…

better629 updated 2 weeks ago
17
huggingface/trl #1723

FSDP Must flatten tensors with uniform dtype but got torch.b…

running dpo with Qwen meet flatten problem. FSDP config as follow ```yaml compute_environment: LOCAL_MACHINE debug: false distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_w…

qZhang88 updated 1 month ago
10
hiyouga/LLaMA-Factory #5169

llama3.1 70B OOM for qlora+fsdp sft.

### Reminder - [X] I have read the README and searched the existing issues. ### System Info I'm using the latest llamafactory version. ### Reproduction Hi, I'm trying to use qlora+fsdp …

mces89 updated 1 month ago
2
Luodian/Otter #256

Load pretrained weight error

running script: ```sh export PYTHONPATH=. accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ ./pipeline/train/instruction_following.py \ --pretrained_mode…

baibizhe updated 6 months ago
13
axolotl-ai-cloud/axolotl #924

AttributeError: 'Linear4bit' object has no attribute 'weight…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…

ErikTromp updated 6 months ago
7

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for fsdp

1000+ results
for fsdp