-
### Bug Description
The `MilvusVectorStore` failed to connect non-localhost uri when `enable_sparse` is `True`
### Version
0.10.36
### Steps to Reproduce
For the codes
```python
vector_store …
-
Hello!
I have been really excited about your work! I attempted to use Palu for model compression on the Qwen2 series models, but regardless of the compression rate I set, I seem to encounter signif…
-
**Describe the bug**
A clear and concise description of what the bug is.
Hey Team, trying to quantize mistral 8*22b with W8A8 recipe and failed with two issues with different versions:
1)
`…
-
Description
When running the code, we successfully obtain a compressed model. However, when prompted with an input, the model generates random and repetitive outputs, often repeating the same letters…
-
Corner case 1:
The IPynb below is made in VSCode. But once imported into Codepod,
1. all source code is gone.
2. HTML is not properly rendered.
![image](https://github.com/RunVas/RunVas/ass…
-
Thank you for your excellent work and code. I have a few questions.
Regarding the arithmetic coding used, how did you determine the precision? Are you using infinite precision or finite precision?…
-
### Feature request
Fu et al. propose a novel decoding technique that accelerates greedy decoding on Llama 2 and Code-Llama by 1.5-2x across various parameters sizes, without a draft model. This meth…
-
Feel free to simply close out this issue if you are not interested but we just implemented QOI image format for VNC to deliver lossless remote desktops using Rust WASM clientside here:
https://githu…
-
Hello,
Thank you for sharing your implementation.
It has been very helpful for me! :)
I have a quick question.
I cloned your implementation and obtained the following image results. Howeve…
-
Recently, we see several awesome work focusing on kv cache compressing and they said can accelearte 1.7~2.3 times than FlashInfer, can you guys plz consider to surpport such features?
Same layer KV…