-
I want to use the mixtral 8X7B model for inference, but currently it only supports autoTP. How to add more support to enable it to use more parallelism (e.g. EP, DP)
-
Update Mixtral `demo_with_prefill.py` demo script with prompts up to 16k tokens.
We support KV cache sizes up to 32K. If we make the prompt 32k tokens and prefill that, we cannot generate any more…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.…
-
https://github.com/nomic-ai/llama.cpp
GPT4All runs Mistral and Mixtral q4 models over 10x faster on my 6600M GPU
-
**Is your feature request related to a problem? Please describe.**
The doc refers to Ollama with the mixtral model.
**Describe the solution you'd like**
Update the doc.
**Describe alternativ…
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue]…
-
### ⚠️ Check for existing issues before proceeding. ⚠️
- [X] I have searched the existing issues, and there is no existing issue for my problem
### Where are you using SuperAGI?
Linux
### …
-
There's a new cache technique mentioned in the paper https://arxiv.org/abs/2312.17238. (github: https://github.com/dvmazur/mixtral-offloading)
They introduced LRU cache to cache experts based on patt…
-
I am not getting error but after running q&a generation it took 20 minutes and got empty datasets
please let me know what can be the cause.
(granite1) sankar@Sankars-MacBook-Pro test1 % ilab dat…
-