Open hademircii opened 8 months ago
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Hi @hademircii , were you able to retest on latest main or recent 0.10 release branches? There are many changes since March, and we believe such issue has been fixed for a while. With your confirmation, I will close the issue. Thanks!
System Info
I am experimenting with TRT LLM and
flan-t5
models. My simple goal is to build engines with different configurations and tensor parallelism, then review performance. Have a DGX system and an AWS P4de that I can work on (a100s). Did a full stack upgrade to each to see if it fixes the problem with no luck.--pre
, gives you a0.8.x
) [fromnvidia-smi
]545.x
)Who can help?
@byshiue @ncom
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
follow the README for encoder-decoder models here (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/enc_dec#download-weights-from-huggingface-transformers) focusing on flan-t5-small (or use large). go for example #3 (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/enc_dec#build-tensorrt-engines)
Expected behavior
build
command exits successfully with engine artifacts exported in target directory.actual behavior
I have tried on a DGX system, an AWS P4de instance, with different TP arrangements, small/large flan-t5 models, adding/removing flags for plug-ins; regardless of the configuration the engine build process errors out when building the
decoder
layer (can see theencoder
undertrt_engine
directory. one way or another, all failure modes appear to be at layer:DecoderModel/decoder_layers/0/cross_attention
with error log:additional notes
without tensor parallelism (tp=1), following the readme work outs fine for small/large t5's. I wonder if anyone had success with flan-t5 models with tensor parallelism ?