-
As the title says, it would be nice to have that information so we can filter out embedd models if we want to allow for model switching on a frontend
-
### Description
After an ongoing conversation (more than 4K tokens) with multiple models (llama2, codellama, tinyllama) via ollama, I compressed the conversation to the 'detailed' level. It worked fi…
-
According to [this Refact blog post](https://refact.ai/blog/2023/self-hosted-15b-code-model/):
> Check out the [docs on self-hosting](https://github.com/smallcloudai/refact-self-hosting) to get you…
-
There is repetitiveness when Orchestrator, Subagent, Refiner all are in sync, the program should terminate by giving the final output. Example as: Based on the conversation, it appears that we have re…
-
Let's turn this Tracehub application into a configurable GitHub action. First of all, it will be more stable (hosted service can experience downtimes and so on) and there is no need to host it, and us…
-
**Env:**
- Container: nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
- TensorRT-LLM release: 0.7.1
- TRT-LLM backend repo tag: v0.7.1
- Model: Llama-2-70b
- tritonserver deployed on 2 A10…
-
Hi @karthink
Thanks for the great package, first of all.
I noticed, that it can be tricky to feed GPT with message that contains parts of previously generated response.
How to reproduce:
1. O…
-
expected status codes:
* `200` OK
* `404` not found, or auth failed
expected JSON response:
```json
[
{
"url": "https://api.github.com/repos/octocat/Hello-World/pulls/1347",
"i…
-
### System Info
I'm trying to PEFT with quantized LLMs. When I used prompt tuning, LoRA, and IA3, it works. However, when I use prefix tuning on 8-bit codellama-7b-hf, it reports the following erro…
-
### Version
Command-line (Python) version
### Operating System
Windows 11
### What happened?
When using LM Studio I get the following error:
There was a problem with request to openai API:
LL…