adamw Search Results - Githubissues

1000+ results
for adamw

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

QwenLM/Qwen2-VL #163

AttributeError: 'AdamW' object has no attribute 'train'（solv…

Refer to https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.html [rank0]: File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", …

fabulousfeng updated 1 month ago
3
qzp2018/MCLN #6

RuntimeError: The size of tensor a (3) must match the size o…

I download all the dataset according to the instruction, but when I run `sh scripts/train_scanrefer_mcln_sp.sh`, I encoutered this fault. It seemed like the code you provided has some problems. ```…

GayStarc updated 4 weeks ago
1
pytorch/torchtune #1576

replace adamW and pagedadam with 8bitpagedadam or torchao CP…

Apparently there is no reason to use paged adam instead of the 8bit version. We should replace it. Also, full finetune single device should use paged adam, instead of adamw, for better memory. F…

felipemello1 updated 1 month ago
10
pytorch/pytorch #133898

recompilation of AdamW w/ OneCycleLR

### 🐛 Describe the bug i'd like to compile my optimizer but am hitting recompilation issues. i wrap my LR in a tensor, but it seems like beta1/beta2 may need similar treatment (based on type annota…

patrick-botco updated 2 months ago
12
cognitivecomputations/grokadamw #6

Very High System-RAM Usage

Using liger kernels and nefttune, the system consumes 3 gigabytes of ram with AdamW, meanwhile with grokadamw, the system uses up the entire 12 gigabytes of ram in a google colab enviroment and crashe…

linux-leo updated 1 week ago
2
rui-ye/OpenFedLLM #28

Where is the relevant code for AdamW optimizer

Hi, Rui, I saw AdamW optimizer in openfedllm's paper, but I didn't find it in the code of repo.

imamtom updated 2 months ago
1
pytorch/pytorch #121857

AdamW(fused=True) slower than unfused AdamW

### 🐛 Describe the bug 512M parameters Mostly vanilla LM transformer. FlashAttention 2.4.2, PyTorch 2.2.0. Uses both FA and FlashRotary. Dtype: bf16 Nvidia A40. single-GPU Unfused: 85 TFLOPS F…

ad8e updated 6 months ago
21
pytorch/pytorch #137368

Fused AdamW causing Illegal memory access on H100 for large …

### 🐛 Describe the bug When training a large model on H100s, we are seeing an illegal memory access error when using AdamW `fused=True`. I suspect the root cause may be related to https://github.co…

tyler-rt updated 5 days ago
1
LTH14/rcg #41

scheduler in rdm training

I noticed that when training RDM, we need to set args.cosine_lr=True to initialize the scheduler in engine_rdm.py. However, the instructions given in the readme defaults to args.cosine_lr=False. I am …

JustinXu0 updated 20 hours ago
1
Azure/MS-AMP #196

Installation might be incomplete

I am installing for the RTX 6000 Ada. I wanted to optimize for that system to run FP8. I follow the [commands](https://azure.github.io/MS-AMP/docs/getting-started/installation/#install-from-source) to…

leedrake5 updated 2 weeks ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for adamw

1000+ results
for adamw