-
model=TheBloke_Llama-2-13B-GPTQ/model.safetensors, I also tried: Wizard-Vicuna-7B-Uncensored-GPTQ-4bit-128g.no-act-order.safetensors, same problem
```
Loading model ...
----------------------…
-
We have found that using the openAI API with different open source inference tools with their web (server) services fails with Instructlab 0.21.0.
We have attempted this with LM Studio, Ollama, an…
-
Can we get the ability to use a file full of prompts pipelined through the LLM for each generation?
1. File would have one basic prompt on each line.
2. One prompt is taken from the file and sent …
-
nVidia Jetson Orin AGX, Respeaker 4.0 USB Mic Array
[LlamaSpeak Tutorial](https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/llamaspeak)
Installed Riva and Python Client no pro…
-
Can this be used as "Pipe" in Open-WebUI front?
-
### Describe the bug
When clicking buttons in the gradio web interface, there's a noticeable delay before the button press is actually received by the python server. I am not sure whether this dela…
-
在百度的云环境(BML CoderLab)中运行报错:
(_demo.py本地加入注释,所以行数显示不对,路径为_**get_knowledge_based_answer**->**knowledge_chainknowledge_chain**)
raceback (most recent call last):
File "/home/aistudio/LangChain-ChatG…
-
I'm currently writing a webui for ollama but I find the API quite limited/cumbersome.
What is your vision/plan regarding it? Is it in a frozen state, or are you planning to improve it?
Here's som…
-
`llama.onnx` is primarily used for understanding LLM and converting it to NPU.
If you are looking for inference on Nvidia GPU, we have released lmdeploy at https://github.com/InternLM/lmdeploy.
…
-
Using: text-generastion-webui
Model: ehartford_dolphin-2.2-mistral-7b - not quantized
Installed: using pip install -r requirements
OS: Windows 10
Shell: cmd.exe
Python: Python 3.11.3
pymemgpt in…