-
Hello,
first and foremost, I want to thank you for your incredible work!
I'd like further information on how to reproduce your code. I followed the code instructions in your README, but I am unabl…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
# URL
- https://arxiv.org/abs/2404.09937
# Affiliations
- Yuzhen Huang, N/A
- Jinghan Zhang, N/A
- Zifei Shan, N/A
- Junxian He, N/A
# Abstract
- There is a belief that learning to compress …
-
### Feature request
Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…
-
### Bug Description
The `MilvusVectorStore` failed to connect non-localhost uri when `enable_sparse` is `True`
### Version
0.10.36
### Steps to Reproduce
For the codes
```python
vector_store …
-
Hi there! Thanks for this amazing library. I was able to run a 70B model on my M2 Macbook Pro!
I was able to get about one token every 100 seconds, which is almost good enough for my overnight task…
-
How feasible to implement spQR into ggml?
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
-
-
### Describe the bug
I used the code in the README and also in the notebook.
Check the code below.
### Steps to reproduce
```python
from langchain_community.document_loaders import TextLo…
-
awq is the sota quantization method. Currently, as a result of my confirmation, I think it is easy to add awq to autogptq because the quantization storage method is the same as gptq.
https://githu…