-
Hi everyone,
I tried to reproduce the finetuning of the alpaca, but I met follow error. Could you please help me?
```python
Running command git clone --quiet https://github.com/huggingface/t…
-
When running tiny-llama-1.1b in Thunder we get an error:
```
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
```
## 🐛 Bug
Full traceback:
```
0: E…
-
## ❓ Question
I am trying to benchmark `llama-2-7b` on the GLUE benchmark for in-context learning. But the accuracy I get for MNLI (`mismatched validation`) is 35.22 for both zero-shot and 8-sh…
-
### Feature request
Support H100 training with FP8 in Trainer and Deepspeed
### Motivation
FP8 should be much faster than FP16 on supported Hopper hardware. Particularly with Deepspeed integration …
-
I am trying to finetune llama3.2 Vision Instruct, and I am using the distributed recipe and example (lora) config as a starting point. Eventually, I am looking to use a custom dataset, but first, I am…
-
@kartik4949 to add information, discussion points, diagrams, links.
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
In the given examples axoltol [exmaples/medusa](https://github.com/ctlllll/axolotl/tree/main/examples/medusa),
I follow the `vicuna_7b_qlora_stage1.yml` and `vicuna_7b_qlora_stage2.yml` to write my …
-
**Context**
Gradient norm clipping is a popular technique for stabilizing training, which requires computing the total norm with respect to the model's gradients. This involves a norm reduction acros…
awgu updated
3 months ago
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…