-
Hello.
Using SFTTrainer, and Qlora, I have been finetuning a variety of LLama 2 Chat models. I have my dataset structured like the following based on what I have read to be the correct format:
``…
-
### Is your feature request related to a problem? Please describe.
I'm using ollama for many things, running lm-studio for this seems wrong as it only runs as an app image.
### Describe the soluti…
-
Hi~My GPU does not support flash attention (V100), so I want to disable it. I noticed that if flash attention is not installed in my environment, the variable [`FlashAttention2Available`](https://gith…
-
I'm trying to make llama2.mojo work on tinyllama-1.1B.
Which is a GQA and not tie_embedding model.
Now I have finish converting the model and modify part of llama2.mojo(llama.cpp,llama.c).
I have n…
-
## 🐛 Bug
I'm noticing this both with using the default LLama-2-7b and [TinyLlama-1.1b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.6). When loading the model for the first time or when …
-
Thank you for the implementation!
Have you come across this error? `InternalTorchDynamoError: 'NoneType' object is not subscriptable`
Code is a hello world basically:
```python
from bitnet.con…
-
Running Maid on Moto G9 Android 11. Tried to run two 1B models obtained from Hugging face [this one](https://huggingface.co/TheBloke/Tinyllama-2-1b-miniguanaco-GGUF) and another one. The model is load…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
I am trying to understand how SentencePiece encoding works. My current understanding is:
* A model is loaded. The model can map "pieces" to "scores".
* A given text is prepended with the `"▁"` cha…
99991 updated
10 months ago
-
**Describe the bug**
until today Jan always worked, but now I get directed to this message: Message queued. It can be sent once the model has started
and nothing happens...no activity on cpu...nothi…