-
### System Info
- Host: VMware ESXi 7
- Host Nvidia drivers: 550.54.16
- VM CPU architecture: x86_64
- VM Nvidia drivers: 550.54.15
- VM OS: Ubuntu LTS 22.04
- Physical GPU: A100
- TensorRT-LLM…
-
### System Info
x86_64
755G
nvidia T4
ubuntu 22.04
trtllm version : https://github.com/NVIDIA/TensorRT-LLM/archive/9691e12bce7ae1c126c435a049eb516eb119486c.zip
pip install tensorrt-llm==0.11…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
Hi, I am quite new to llama factory framework, I am not able to find the config.yaml for longlora and st…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.…
-
```
CUDA_VISIBLE_DEVICES=0 python test/on_chip.py --prefill 124928 --budget 4096 \
--chunk_size 8 --top_p 0.9 --temp 0.6 --gamma 6
Loading checkpoint shards: 100%|█████████████████████████████████…
-
I have a question on paper results.
![image](https://github.com/Infini-AI-Lab/TriForce/assets/50622684/d69216c5-1b99-466e-b1e6-b1134b140abc)
Does Retrieval w/o Hierarchy test with normal speculati…
bxyb updated
2 months ago
-
error while run `bash scripts/streaming/eval.sh full`
![image](https://github.com/FMInference/H2O/assets/26181650/e18118c6-ca59-42dd-a21c-1ebd2469d0ba)
-
latest transformers has stronger issues. Any chance to update this repo for 4.36.1+?
-
Hello!
Does TensorRT-LLM supports Medusa with Mixtral 8x7B?
My understanding is that right now the Medusa [convert_checkpoint.py](https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/medusa/c…
-
### System Info
Hello, I am building a llama 3 70b engine. If I do not specify `--max_input_len` and `--max_output_len` then requests are capped at 1024 tokens for some reason. Ideally I want the inp…