adamw Search Results - Githubissues

mlcommons/algorithmic-efficiency #809

Add schedule-free adamw submission in JAX

## Description Currently we have been unable to reproduce the schedule free adamw results with JAX. There seem to be differences between the optax implementation of schedule-free adamw and the pyto…

priyakasimbeg updated 4 days ago

unslothai/unsloth #1069

AttributeError: 'AdamW' object has no attribute 'train'

This is not bug report. Just report. It's not unsloth error was not the cause, the title error occurred at the start of training and accelerate seemed to be affecting it. https://github.com/hu…

webbigdata-jp updated 4 weeks ago

cognitivecomputations/grokadamw #5

How can I use this optimizer in trl ? Can you help me with a…

When I put in the trl library I get this error: is not a valid OptimizerNames, please select one of ['adamw_hf', 'adamw_torch', 'adamw_torch_fused', 'adamw_torch_xla', 'adamw_torch_npu_fused', 'adam…

usarth updated 1 month ago

pytorch/xla #8071

Optimizer Memory in AdamW/Adam vs SGD

## ❓ Questions and Help It is to my understanding that Adam should use more memory than SGD because it keeps track of more parameters. However, when I look at my profiles between Adam and SGD optim…

dangthatsright updated 1 month ago

FluxML/Flux.jl #2433

Implementation of `AdamW` differs from PyTorch

Hi, thank you for developing and maintaining this awesome library and ecosystem! I'm not entirely sure but could it be that the documentation for the `AdamW` optimizer is a bit misleading? If I und…

dpaetzel updated 19 hours ago

speechbrain/speechbrain #2713

High Training Loss and WER in Transducer Recipe on LibriSpee…

### Describe the bug I'm trying to implement the recipe https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transducer but the WER and train loss are very high. After runn…

nutan235 updated 1 day ago

ThereforeGames/blora_for_kohya #2

Q: AdamW LR?

I tried with Prodigy optimizer, and it is exactly as you wrote - reaaallly slow convergence. I trained the model for 120 epochs and I could easily train another 60 epochs. I want to giva a try with Ad…

kuzman123 updated 3 months ago

bitsandbytes-foundation/bitsandbytes #1402

Support running on CPU

### Feature request Hi thanks for the library! It would be great if the optimizers can be run on CPU. For example, I would like to try adamw_8bit to full-finetune a 8B model on a 24GB GPU card (RTX40…

fzyzcjy updated 1 week ago

LambdaLabsML/distributed-training-guide #44

Add ZeroRedundancyOptimizer to chapters 2 & 3

Docs: https://pytorch.org/docs/2.4/distributed.optim.html#torch.distributed.optim.ZeroRedundancyOptimizer ```diff - optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr) + optimizer…

corey-lambda updated 2 weeks ago

NVIDIA/apex #1849

AdamW implementation does not truly decouple learning rate a…

**Describe the bug** AdamW implementation (see [here](https://github.com/NVIDIA/apex/blob/a7de60e57f0534266841e1733262601ad76aaa74/csrc/multi_tensor_adam.cu#L333)) does not truly decouple the weight…

leenachennuru updated 3 weeks ago

1000+ results for adamw

1000+ results
for adamw