Tutorial / Example for Single Node FP8 Inference?

Hello,

The web page for Nemotron-4-340B states: https://research.nvidia.com/publication/2024-06_nemotron-4-340b

These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision.

However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?

NVIDIA / NeMo-Aligner

Tutorial / Example for Single Node FP8 Inference? #216