-
## Goal
- Jan supports most llama.cpp params
## Tasklist
**Cortex**
- [x] https://github.com/janhq/cortex.cpp/issues/1151
**Jan**
- [ ] Update Right Sidebar UX for Jan
- [ ] Enable Jan's API serv…
-
The spec-infer works well for batch size (1,2,4,8,16). But I change the batch size to 32, it turns out to be "stack smashing detected"
```+ ncpus=16
+ ngpus=1
+ fsize=30000
+ zsize=60000
+ max_se…
-
由于网络以及权限问题,我们无法在reasoning_and_editing.py调用GPT-3.5-Turbo,请问能否提供使用LLama2来进行标题编辑生成的代码呢?
-
Good day everyone, I am trying to run llama agentic system on RTX4090 with FP8 Quantization for the inference model and meta-llama/Llama-Guard-3-8B-INT8 for the Guard. WIth sufficiently small max_seq_…
anret updated
2 months ago
-
### Jan version
0.5.4
### Describe the Bug
I can successfully load the model for chats, but as soon as I send an image, it crashes.
Context:
- I created a model.json to download the text an…
-
What I understand about this is actually deploy a model (e.g Llama3.1-70B-Instruct) by using 'vllm serve Llama3.1-70B-Instruct ... ' and then config the url and model name to llama-stack for LLM capab…
-
**Describe the bug**
Messages breaks and the inference doesn't complete:
![中断](https://github.com/user-attachments/assets/ea06ceee-f49b-4770-b5f6-da0946f73436)
**Steps to reproduce**
1. Create…
-
### Proposal to improve performance
Improve bitsandbytes quantization inference speed
### Report of performance regression
I'm testing llama-3.2-1b on a toy dataset. For offline inference using the…
-
Hi,
I've trying to serve different Phi3 models using the Llama.cpp server that is created by the init-llama-cpp ipex.
When I server with this version I have two problems:
1) The server doesn…
hvico updated
2 months ago
-
Discussion for this in #373 and #284.
The export script in sharktank was built specifically for llama 3.1 models and has some rough edges. Along with this, it requires users to chain together cli c…