-
### Issue you'd like to raise.
I have installed langchain and ctransformer using -
```
pip install langchain
pip install ctransformers[cuda]
```
I am trying following piece of code -
```
…
-
```
% python test_inference.py -m CodeLlama-13B-GPTQ/ -p "int main(" -nfa -l 2048 -lm
Finding flash_attn
NO flash_attn module
-- Model: CodeLlama-13B-GPTQ/
-- Options: ['length: 2048',…
-
- [ ] [Announcing function calling and JSON mode](https://www.together.ai/blog/function-calling-json-mode)
# Announcing function calling and JSON mode
**DESCRIPTION:**
Announcing function calling…
-
Setting the temperature in a particular range causes vllm to generate whitespace-only outputs. Values above/below this range work correctly. I have seen this with facebook/opt-125m, fine-tuned mistral…
-
### Summary
Phi-2 prompt template is implemented internally:
https://github.com/second-state/LlamaEdge/blob/6eed9d5b25133e623f643e212c4a672bd2c769e6/api-server/chat-prompts/src/lib.rs#L55
But i…
-
The puzzle `17-1a69e44f` from #17 has to be resolved:
https://github.com/h1alexbel/fakehub/blob/31636e6ff542ff1317c28ae7bbac232572a06a30/server/src/xml/storage.rs#L59-L62
The puzzle was created by …
-
it seams there are still issues with wave64 devices
i just tried latest master a cdna1/mi100 with rocm 6.0.2 and pytorch 2.3.0 @ 97ff6cfd9c86c5c09d7ce775ab64ec5c99230f5d
```
Finding flash_attn
…
-
Currently the models need to be specified as `llama7b` for example, but what if one wants to use `codellama/CodeLlama-7b-hf` or `meta-llama/Llama-2-7b-hf` (non chat version), etc.?
A more flexible me…
-
### Description
Issue: an unrequested call to the secondary diagram model is made every time the chat button is clicked. resulting in extra charges
Step 1. Configure Model settings with OpenRout…
-
### What happened?
When running the convert-hf-to-gguf.py script for the gemma-1.1-2b-it model I get the following error I added to the relevant log output field.
For reproduction of the error, r…