-
When I try to run patch_model_for_compiled_runtime on 8bit + aten, the program reports an error. How can I solve this problem?
![image](https://github.com/user-attachments/assets/f0a85477-f36e-4081-b…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm …
-
Do you support Exllamav2 backend for the inference that supports exl quants?
The current alternative is vllm but that doesn't support EXL quants. Also, after running a perplexity test, EXL is the b…
-
I found that the device memory usage keeps increasing when execute basic_quant_mix.py, it will raise OOM when model has large parameters, so, how to optimize it. Thank you~
@Qcompiler
-
`(quant 30)` is not supported by this project at all, and yet I don't think it is mentioned.
Is there any other syntax that is not supported and not listed?
-
**Title:** Support for ES Module Import in `@quantlib/ql`
**Description:**
I encountered an error while predicting the next price using the `@quantlib/ql` library. The error message indicates th…
-
Thank you for sharing this Quant Project,
could you let me know which course this is from?
-
I confused the method ldlq_Rg dont support group quantization.
-
I think I have a mess to clean up, which I will soon.
I believe I finished the quantity element. I tried to streamline things for Rob, and so I rebased and squashed commits. I knew that he had alread…
-
I use awq to quantize llama 2 70b-chat by:
```
CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" python quantize_llama.py
```
the codes of quantize_llama.py:
```
from awq import AutoAWQForCausalLM
from tr…