-
Hi, dear:
I have completed the conversion and export of the model format by smoothquant, but when I use vllm to load the model and do inference, the error is as follows:
INFO 12-05 09:00:58 tok…
-
quantized weights, scales and metadata can be quantized into a state_dict that can later be reloaded and applied to a quantized model.
The process is a bit convoluted, as it requires the target mod…
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](ht…
-
This worked in Oct 15 jlama:
```
$ ./run-cli.sh complete -p "def fib(" -t 0.2 -tc 24 -n 100 models/CodeLlama-7b-hf
```
Now it OOMs (note that I have doubled the default Xmx, which was not nece…
-
The context size given to llama.cpp to load this model is 4096, this requires around 10 GB of memory for the context alone, and if we add the 4.5 GB required for the 7B model it's unfeasible to use it…
-
### Version
VisualStudio Code extension
### Operating System
Windows 10
### What happened?
`ENDPOINT=OPENROUTER`
`MODEL_NAME=anthropic/claude-3-opus`
I create a new app and it see…
-
## Description
In FIM mode, an extra space is added at the beginning of the first line if is ends with \n.
## How to repeat
```python
from transformers import AutoTokenizer, AutoModelForCausalL…
-
Things are changing at a breakneck pace. There is already a Llama 13b pytorch with 32k context. I figure it would be appropriate to ask for compatibility to be added into Kobold, when time permits.…
-
### Which Cloudflare product does this pertain to?
Workers AI
### Existing documentation URL(s)
https://developers.cloudflare.com/workers-ai/models/llm/
### What changes are you suggesting?
Pleas…
-
# Weekly GitHub Trending! (2024/04/15 ~ 2024/04/22)
## Python trending 11repo's
### [1Panel-dev](https://github.com/1Panel-dev) / [MaxKB](https://github.com/1Panel-dev/MaxKB)
💬 LLM 大規模言語モデルに基づくナレッジベース…
ivgtr updated
5 months ago