These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision.
However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?
Hello,
The web page for Nemotron-4-340B states: https://research.nvidia.com/publication/2024-06_nemotron-4-340b
These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision
.However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?