-
👋 Hello Neural Magic community developers,
I encountered an issue while calculating the perplexity for a locally converted Llama3-8B sparse model using the llm-compress library. I'm refer the spars…
-
### What happened?
Hi, recently, I'm trying to learn the gguf-py lib and use the gruff-py and write a script to make a gguf file, after I made the file, I tried to load it using llama-cli, but it sai…
-
Hi ,
can help to add runtimeclass on the nimcache and all others crd ?
got this error
Traceback (most recent call last):
File "/usr/local/bin/download-to-cache", line 5, in
from vllm_nv…
-
I have used command `tune run generate --config custom_quantization.yaml prompt='Explain some topic'`to generate inference from finetuned phi3 model through torchtune
Config custom_quantization.y…
-
Hi,
When running `mtq.quantize` with `"calibrator": "historgam"` in my config, i got the following assert error
```
File "modelopt/torch/quantization/model_calib.py", line 220, in modelopt.torch.…
-
### System Info
```
pip install git+https://github.com/huggingface/transformers.git
pip install tokenizers==0.20.0
pip install accelerate==0.34.2
pip install git+https://github.com/huggingface/tr…
-
In the recent update to Modelling/Vector_Quantization.ipynb code block [6], the variable "dataset_name" is not defined.
-
### Describe the bug
When I load the text_encoder like this:
```
model_id = "black-forest-labs/FLUX.1-schnell"
text_encoder = T5EncoderModel.from_pretrained(
model_id,
subfolder="t…
-
Hi TensorRT-LLM team, Your work is incredible.
By following the READme file for [multi-modeling](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md), we were sucess to run…
-
### 🐛 Describe the bug
python torchchat.py generate stories110M --quant torchchat/quant_config/cuda.json --prompt "It was a dark and stormy night, and"
Using device=cuda Tesla T4
Loading model...…