@KeitaW thanks for the review! I was thinking about adding FP8 support to FSDP example, but there are two aspects why I decided to create a separate example for this:
Transformer Engine requires Nvidia's container to run(or as alternative relatively complicated process of building from source with CUDA headers, CUDNN etc). And I don't want to complicate FSDP example with it.
This example is bound to Llama model(taken from TE examples), but FSDP example supports multiple models that I don't want to rewrite with FP8.
So, in terms of importance this example is about LLama with FP8. FSDP training here is just kind of scaffolding.
@KeitaW thanks for the review! I was thinking about adding FP8 support to FSDP example, but there are two aspects why I decided to create a separate example for this:
So, in terms of importance this example is about LLama with FP8. FSDP training here is just kind of scaffolding.