-
The issue with offloading is the very inefficient use of GPU (more than 90% of the unlearning time is spent on offloading and loading memory).
Instead, we could try to use parallelize the unlearning …
-
Hi, I'm encountering problem when trying to import dwpose.py. It looks like the problem is cuda version or torch version, I'm wondering that is there specific requirements for cuda or pytorch version?…
-
I'm running my code with:
```
env CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stderr torchrun --standalone --nproc_per_node=8 -m extra_scripts.model_playground_train
```
and getting errors like:
```
…
-
Could you provide a script/notebook demo to show how to finetune this model?
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
- [x] FSDP + PP
- [ ] DDP + PP
- [ ] DCP path
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
**Describe the bug**
I tried to quantize Qwen1.5-MoE-A2.7B-Chat with w4a16 for vllm PR: https://github.com/vllm-project/vllm/pull/7766
raise error TypeError: forward() got multiple values for argume…
-
### 🚀 The feature, motivation and pitch
I’m new to working with FSDP and testing the FSDP fine-tuning with **alpaca_dataset** dataset. I have a single node with 8 GPUs and have set the batch size t…
-
We have discussed the following so far.
- Decide which domain
- Math([GSM8k](https://huggingface.co/datasets/gsm8k)), Code([Stack 2](https://huggingface.co/datasets/bigcode/the-stack-v2)), Gene…