Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.07k
stars
60
forks
source link
Make CudaGraph wrapping a transform (but call it explicitly) #635
if cd.use_cudagraphs:
computation_trc = cuda_graph_transform(computation_trc)
computation_traces.append(computation_trc)
if backward_fn is not None:
backward_trc = cuda_graph_transform(backward_trc)
backward_traces.append(backward_trc)
comp = computation_trc.python_callable()
if backward_trc is not None:
backward_fn = backward_trc.python_callable()
else:
backward_fn = None
Currently when running
I get
Ideally, we could transform the extrace for CudaGraphs instead of wrapping:
This could be applied in the same way to both backward and forward (or a joint trace eventually).
My idea would be that
__init__.py
https://github.com/Lightning-AI/lightning-thunder/blob/9f9dcafc9ba5b07652bbab91a602aec3c628c8d1/thunder/__init__.py#L623-L636
would be changed to
@nikitaved