-
when using fastertransformer_backend decouped mode is True, the output will diff with decouped is False. And the output length is wrong
### Branch/Tag/Commit
main
### Docker Image Version
t…
-
I am getting an error when using `TensorRT-LLM/examples/gptneox/build.py` to build the TensorRT engine:
```
line 314, in build_rank_engine
assert hf_gpt is not None, f'Could not load weights …
-
When trying to run Pythia model using gptneox, I got this error, btw I use termux on Android with rust installed to run this model.
$ cargo run --release -- gptneox infer -m pythia-160m-q4_0.bin -…
-
With all the variant of ML model out now - gpt2/gptneox/llama/gptj, I wonder if theres a way to infer the model's type from reading it?...
Right now, if someone gives me a random model file with ob…
-
```
root@5dac227a29e8:~# LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD /usr/local/go/bin/go run /root/go-ggml-transformers.cpp/examples/main.go -m "/models/pythia-70m-q4_0.bin" -t 14
gpt2_model_load: loadi…
-
### Branch/Tag/Commit
main
### Docker Image Version
nvcr.io/nvidia-pytorch:22.07-py3
### GPU name
A100
### CUDA Driver
450.156.00
### Reproduced Steps
Follow the steps: Fast…
-
### Branch/Tag/Commit
main
### Docker Image Version
none
### GPU name
T4
### CUDA Driver
525.60.13
### Reproduced Steps
```shell
## Steps
1. Download public GPT-NeoX Model https://huggingfac…
-
https://github.com/togethercomputer/redpajama.cpp
https://www.together.xyz/blog/redpajama-models-v1
-
### Branch/Tag/Commit
main
### Docker Image Version
nvcr.io/nvidia-pytorch:22.07-py3
### GPU name
A100
### CUDA Driver
450.156.00
### Reproduced Steps
```shell
1. download …
-
https://github.com/triton-inference-server/
- [x] Build Triton Docker image with support for FasterTransformer backend for Fusion etc.
- [x] convert h2oGPT models to format that Triton understands h…