-
**Why**
To streamline user interactions with the language learning model (LLM) in the chat application, users will be able to quickly select from a variety of predefined prompt templates. This featur…
-
Hello team,
We typically use `gather_all_token_logits` to collect the logit tensors for post-processing. Especially for large vocabulary sizes (128 000) this can require a lot of GPU memory. For ex…
-
Efficient Streaming Language Models with Attention Sinks [paper](https://arxiv.org/abs/2309.17453)
These repo has already implemented it:
[attention_sinks](https://github.com/tomaarsen/attention_si…
-
### Feature Area
vault data
### Painpoint
I really like this project, but I have troubles using it with my vault. Do you guys have any tips or tricks how to creates notes that can be used by this, …
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
Even though I have updated the package to the latest version, the function call is still fa…
-
### Problem
The CLI needs a new RPC method that allows for code changes to be applied to a specific file. This method should take in a file path and new code content, and then use the language model …
-
This was my first time working with LLMs as a Machine Learning Engineer. So, I've learned a few things:
- Prompt engineering is very crucial for the performance and accuracy of the application and ev…
-
We are building a voice-interactive chatbot that leverages cutting-edge technologies such as Speech-to-Text (STT), Text-to-Speech (TTS), and local Large Language Models (LLMs), with a focus on Ollama'…
-
So we're having issues inferencing efficiently at scale, and of course we're processing the audio parts one by one as is default for inference, but is there any support for batch inference to speed th…
-
Referencias:
- [ReLeLa](https://relela.com/)
- [BETO: Spanish BERT](https://github.com/dccuchile/beto)
- Los modelos de [Jorge Ortiz Fuentes](https://huggingface.co/jorgeortizfuentes) como [Tulio…