-
**Describe the bug**
I'm trying to build Metal with the profiler enabled and getting build errors and failure. the exact error depends on the script used (either `build_with_profiler_opt.sh` or using…
-
**Describe the bug**
I am attempting to run the LLaMA2 demo at https://github.com/openvinotoolkit/model_server/blob/main/demos/llama_chat/python/README.md. When I run:
``` sh
python client.py -…
-
### Willingness to contribute
No. I cannot contribute this feature at this time.
### Proposal Summary
During prompt experimentation you often set system prompts for OpenAI, Azure, and open source m…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
I used Runpod to test …
-
**How to do in P1**
Brett Ostwalt commented:
[Eric Robinson (AIA | 21 AS)](https://jira.il2.dso.mil/secure/ViewProfile.jspa?name=erob) alright connectivity should be working now.
in order to call th…
-
We are building a voice-interactive chatbot that leverages cutting-edge technologies such as Speech-to-Text (STT), Text-to-Speech (TTS), and local Large Language Models (LLMs), with a focus on Ollama'…
-
Hey guys,
What is this jaw dropping nightmare that you put me thru?
PS C:\AI\TensorRT\TensorRT-LLM\examples\llama> python build.py --meta_ckpt_dir C:/AI/LLaMA2_Docker_FileSystem/codellama/CodeL…
-
Llama2 (and Llama-based models) timeout. Other chat models (tested Mistral, Mixtral) respond fine. Below is the snippet of the docker container log capturing when the request is sent from Refact exte…
-
Hi!
II have a finetuned Llama2 and followed the example/llama.py. When I build the model in fp16, it works just fine, and produces sane results. When we use either the `--fp8` or `--fp8-cache`, the…
-
Hi, we have tried to run the speculative inference process on OPT-13B and Llama2-70B-chat, but meet some issues. Specifically, for Llama2-70B-chat , we obtained performance worse than vLLM, which seem…