-
Hi, I'm trying to fine-tune the Llama3.1 8b model but after fine-tuning it uploading it to HF, and when trying to run it using vLLM I get this error "KeyError: 'base_model.model.model.layers.0.mlp.dow…
-
### Environment
🐧 Linux
### System
N/A
### Version
SillyTavern 1.12.5 'staging' (38d24f4b)
### Desktop Information
_No response_
### Describe the problem
Mistral's tokenizer is weird and we p…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
https://www.youtube.com/@AIsuperdomain
win4r updated
1 month ago
-
### Cortex version
0.5.0-68 (actually 67)
### Describe the Bug
From #1239:
When pulling a model from HF /, I expect to display its variant options and allow user to select.
Pull directly from H…
-
Hi. Raising this issue as I am experimenting a much slower inference time with Gemma-1 models.
> Environment:
> - xformers 0.0.26.post1 pypi_0 pypi
> - unsloth …
-
### Which Cloudflare product(s) does this pertain to?
Wrangler
### What version(s) of the tool(s) are you using?
Wrangler 3.72.2
### What version of Node are you using?
16.15.1
### W…
-
I have been fine-tuning Mistral-7B-Instruct-v0.2 recently and I noticed that when I don't use SWA and train with a sequence length of 32K, the initial loss is unusually high (6.0). However, when I tra…
-
If you are submitting a bug report, please fill in the following details and use the tag [bug].
**Describe the bug**
Gemma-2-{size} is not loadable using from_pretrained. I checked OFFICIAL_MODEL_…
-
In order to apply LLM2Vec to DictaLM we need:
- [ ] Identify base model - https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27
- [ ] Prepare dataset for MNTP…