-
After successfully adding our azure deployed `gpt4o_mini` and `text-embedding-ada-002` instances (many thanks for this plugin!, btw, [I've made a small PR to update the README](https://github.com/fabg…
-
The backend code generation keeps relooping after finishing:
log:complete: node:op:PROJECT::STATE:UPDATE {"context":{"streams":{},"project":"todos"},"response":{"success":false},"data":{"operation"…
-
```
task_manager = TaskManager(self.agent_config.get("agent_name", self.agent_config.get("assistant_name")),
bolna-app-1 | File "/app/bolna/agent_manager/task_manager.py", line 58, in __init__
…
-
https://developer.nvidia.com/zh-cn/blog/nvidia-tensorrt-llm-revs-up-inference-for-google-gemma/
This post says gemma supports quantization, so does recurrentgemma support quantization?
-
@Pty72
哈喽,用SCC做LLM的KD会不会很慢?我第一次提出SCC做KD的时候就有考虑过在NLP与多模态也复刻一遍,LLM火的时候我感觉没资源这么玩,baidu的自驾朋友跟我说他们跨模态KD觉得太慢,我看你的代码是算了两次soft-rank,会不会更慢
-
**Is your feature request related to a problem? Please describe:**
Congrats on the launch! Very cool stuff, but one immediate limitation I noticed is you don't have realtime info about packages. li…
-
This is so good, just if this would work with local LLM such as phind also and not only with openAI API would be perfect.
IVIJL updated
2 months ago
-
The existing Template size of text submitted to the LLM is too big in terms of tokens wasted. For example this:
~580 Tokens
```
Context: Title: s3://gendox.organization.documents.dev/9228b56c-1058-4b…
-
### What happened?
I am trying to run Qwen2-57B-A14B-instruct, and I used llama-gguf-split to merge the gguf files from [Qwen/Qwen2-57B-A14B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2-57B-A14B-…
-
Hi TensorRT-LLM team, Your work is incredible.
By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…