-
```
model = AutoGPTQForCausalLM.from_quantized(
model_name,
#use_triton=True,
#warmup_triton=False,
trainable=True,
inject_fused_attention=False,
…
-
Hi,
What is the best way to run this on my high performance laptop?
Should this somehow work? Can i calculate how many days/weeks it will run?
Thanks in advance
Specs:
> OS: Win 11 (WSL2…
-
### 🐛 Describe the bug
I used fsdp+ShardedGradScaler to train my model. Compared with apex. amp+ddp, the precision of my model has decreased.
The ddp is like
```
model, optimizer = amp.initial…
-
## Environment
- OS: [Ubuntu 23.06.30]
- Hardware (GPU, or instance type): [8xV100]
## The issue
I am trying Streaming Dataset with [Pytorch Lightning](https://lightning.ai/docs/pytorch/…
-
**Describe the bug**
Getting the following error only by changing the model to `llava-onevision-qwen2-0_5b-ov` from `llava1_6-mistral-7b-instruct` in the first DPO example [here](https://github.com/m…
-
### Bug description
Gradient synchronisation in `fabric.backward()` is broken when moving a model back to CPU and back again to GPU.
Moving a model temporarily to CPU is useful when GPU resource…
-
### 🐛 Describe the bug
When I try to finetune with ddp([LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)) in wsl2(win10 host), I get this error:
```
DESKTOP-VMBL43V:1354:1354 [0] NCCL INFO …
-
I saw some code under [RWKV-LM/RWKV-v4neo/src/model.py](https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v4neo/src/model.py) which requires CUDA to create RWKV model.
I want to change the code by …
-
Adding my own modification part to the official code found that it can run in single card mode, but it cannot run in multi-card case, what should I do?
![image](https://github.com/open-mmlab/mmdete…
-
### Feature request
Token averaging in gradient accumulation was fixed in #34191 . But token averaging in DDP seems to have the same issue.
---
## Expected behaivor
With all the tokens contr…
sbwww updated
1 month ago