-
I'm benchmarking vicuna 13B using trt-llm v0.9.0 on 2*A30 GPU, and try the following configurations.
![image](https://github.com/NVIDIA/TensorRT-LLM/assets/26128514/49016591-0305-4842-a902-80f667d9…
-
### 🐛 Describe the bug
TorchServe version is 0.10.0.
It's my code:
```
def get_inference_stub(address: str, port: Union[str, int]= 7070):
channel = grpc.insecure_channel(address + ':' + str(p…
-
### Already reported ? *
- [X] I have searched the existing open and closed issues.
### Regression?
Yes
### System Info and Version
System/Version info
```sh
Hyprland 0.45.0 built from branc…
-
I want to finetune other models, llama / vicuna in my case. How can I go about it?
-
This issue is meant to track IREE performance on large language models with attention mechanisms, most specifically in the case of int8 quantized Vicuna/Llama. We also aim to address how performance f…
-
Hi Authors, we notice that all of the attack code are missing chat templates for models. Things like `USER: {instruction} ASSISTANT:` for vicuna or `[INST] {} {/INST}` for Llama2 which make the benchm…
-
While CogVLM is trained, LM weights are fronzen.
From my observation however, the LM weights of cogvlm are different with Vicuna
Vicuna: https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main
Co…
-
When reviewing the nixpkgs PR https://github.com/NixOS/nixpkgs/pull/349999 to update from 1.3.0-234-g228c4f0cb -> 1.3.0-353-g7a242f456 I noticed during playtesting that the use of a forge would reliab…
-
I run InstructBLIP successfully when LLM is flant5xl or flant5xxl, but when I switch LLM as vicuna-7b-v1.1, the output is a string of nothing(['']). Actually, when I use vicuna-7b-v0, there are some r…
-
Hello! tks a lot for this app.
Can you make this compatible with vicuna model?