We have observed that for certain build configurations (e.g. FP8 quantization) the build time and runtime resources can differ, necessitating independent configurability of these resources.
:computer: How
Add a new config trt_llm.build.num_builder_gpus
Add associated warning for unset num_builder_gpus when using FP8 quantization types
:rocket: What
:computer: How
trt_llm.build.num_builder_gpus
num_builder_gpus
when using FP8 quantization types:microscope: Testing