Closed kshitij12345 closed 3 months ago
model = thunder.distributed.fsdp(model) # use the transform model = thunder.jit(model, executors=["torch"], post_optimization_transforms=[nvtx_profile_transform])
I think there would be a few other ways to setting up this model.
One would be
jitted = thunder.jit(model, ..., post_optimization_transforms=[nvtx_profile_transform])
fsdp = thunder.distributed.fsdp(jitted)
The other would be to use thunder.core.transforms.add_transform
jitted = thunder.jit(model, ..., post_optimization_transforms=[nvtx_profile_transform])
fsdp = thunder.distributed.fsdp(jitted)
fsdp = add_transform(fsdp, nvtx_profile_transform)
oops it doesn't seem to support post_optimization_transforms.
anyway could you check these or the first one works?
The first one works correctly, (will update the example in PR description with this)
nvtx_profile_transform = NvtxProfileTransform()
model = thunder.distributed.fsdp(thunder.jit(model, post_optimization_transforms=[nvtx_profile_transform]))
As for second, there is add_post_optimization_transform
instead which works.
nvtx_profile_transform = NvtxProfileTransform()
from thunder.core.transforms import add_post_optimization_transform
model = thunder.distributed.fsdp(thunder.jit(model, executors=["torch"]))
model = add_post_optimization_transform(model, nvtx_profile_transform)
@t-vi agreed, have added a simple test. Thanks!
This PR adds a post optimization transform to wrap compute symbols in
NVTX
range. This makes it easy to profile the trace withNsight Systems
and to easily map trace operations to GPU execution timeline.For Future, we should allow user to:
Usage
Example of the transformed trace (for brevity, generated from a different script than above):
Example in nsight GUI
Alternative: One alternative to this is to use nvtx package with automatic annotation but this leads to a very dense (and big profile) report (with more information than required to absorb) Eg.