-
## 🐛 Bug
Hello, I'm running into an issue where my batch size begins to vary half way through an epoch.
### To Reproduce
I logged when it deviated from 64. It happens in all epochs, and when trai…
-
Using the microsoft/Phi-3-medium-128k-instruct model, I received incorrect responses for multi-byte characters (commonly seen in Japanese or Chinese), as shown below:
```
mlx_lm.generate --model mic…
wil24 updated
2 months ago
-
**LocalAI version:2.16.0
**Environment, CPU architecture, OS, and Version:**
mac studio M2 Ultra
**Describe the bug**
using backend transformers for glm4, trust_remote_code: true not c…
-
Hi
So I was training a new tokenizer from Llama Tokenizer (meta-llama/Llama-2-7b-hf), on a medium sized corpus (Fineweb-10BT sample : 15 million documents with average length of 2300 characters). A…
-
I am relatively new to running inference on my own. Previously, I used ollama, but recently I decided to try out mlx since I have an M3 with sufficient unified memory and I was curious about how it co…
-
### The Feature
Hi!
The tokenizer you are using for claude-3 is not accurate, the correct numbers are output in the chunks (first chunk for prompt token and last chunk for response token. Proposal…
-
I am attempting to build a chatbot using TrtLlmAPI as the llm
```
llm = TrtLlmAPI(
model_path=trt_engine_path,
engine_name=trt_engine_name,
tokenizer_dir=tokenizer_dir_path,
…
qbm5 updated
2 months ago
-
I want to achieve autogptq model streaming output.
Now i just achieve output like this
input_ids = tokenizer.encode(inputs,
return_tensors="pt",
…
-
When `genai-perf` is installed using `pip` from Github (as documented), on first run it tries to download several files from Huggingface, like this:
```
$ docker run --rm -it --name test -u 0 gpu-tr…
-
### System Info / 系統信息
python 3.11.8
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [X] pip install / 通过 pip install 安装
- [ ] installation from source / 从源…