NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
419 stars 45 forks source link

Tutorial / Example for Single Node FP8 Inference? #216

Open noamgat opened 1 week ago

noamgat commented 1 week ago

Hello,

The web page for Nemotron-4-340B states: https://research.nvidia.com/publication/2024-06_nemotron-4-340b

These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision.

However, I wasn't able to deploy this. Is there a guide for weight conversion / sample configuration, for example to serve the reward model on a single 8x80GB node using FP8 precision?