Closed sleepwalker2017 closed 6 months ago
Hey, I believe the devs have some best-practices (or suggested optimizations) listed here
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
This issue was closed because it has been stalled for 15 days with no activity.
I'm reading the manual here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md The scripts are so simple, do they ensure best performance?
I can't find a full configuration for building llama engine. Is there any?
And also, the
trt-build --help
gives a lot of options, but I can't find the meaning and default value for many of them. How should we choose from these options?