-
What can I do for this issue?
using:
MODEL_ID = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
MODEL_BASENAME = "model.safetensors"
The model 'LlamaGPTQForCausalLM' is not supported for text-gener…
-
Right now the spline code has sharp breaks where it meets the tails. This isn't always wrong, but I think it happens more often than it should. Perhaps there is a way to resolve this issue via penaliz…
-
## Bug report
When running large workflows, the main process gets interrupted with the error `UNKNOWN: channel closed`.
### Expected behavior and actual behavior
Expected: Normal execution…
-
I am having bad quality results with prompts longer than 2048 tokens with a LoRA trained with alpaca_lora_4bit.
These are the settings I am using:
```
config = ExLlamaConfig(model_config_path) …
-
I have `INT8` quantized a `BERT` model for binary text classification and am only getting a marginal improvement in speed over `FP16`.
I am using the `transformer-deploy` library that utilizes Tens…
-
HI,
I'm trying to create a dense representations from my corpus and search paragraphs/phrases by keywords or a question. I don't have labeled Questions and Answers and I don't need for now to get a…
-
Loading the quanitzed TheBloke/Llama-2-70B-chat-GPTQ or TheBloke/Llama-2-70B-GPTQ model across multiple GPUs. The model is getting loaded, but the query is throwing an error
```
ValueError: not en…
-
Hi, I find the error as shown below when running
```
quant_mat
-
config like this:
```
base:
seed: &seed 42
model:
type: Qwen2
path: /home/LLMCompression/model/Qwen2-7B # model path
tokenizer_mode: slow
torch_dtype: auto
calib:
nam…
-
### System Info
Hello TensorRT-LLM team! 👋 I'm facing an issue where the inference output does not contain the expected "Singapore" text. Below are the details of my setup and steps to reproduce the …