Enable codegen for matmul/linear in nvfuser through Thunder

Priya2698 commented 2 months ago

This task will involve development within nvFuser and Thunder.

Current status: In Thunder, we have the compile flags nvf_enable_matmul and nvf_enable_linear to allow nvfuser to optionally accept matmul/linear operators. By default, these are executed through ATen via our ExprEvalScheduler.

Next steps: Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval. For experimentation directly through Thunder, we want to allow setting these options from within Thunder.

Related efforts:

https://github.com/NVIDIA/Fuser/pull/1905

Similar to this approach, we can specifically add additional arguments in fd.execute to control the above two global enums and exercise different combinations of Matmul/ExprEvalScheduler.

https://github.com/NVIDIA/Fuser/pull/2077

More generally, we can expose these global enums through the python interface of fd.execute.

In both these cases, within Thunder, we can add the ability to pass additional flags to the nvfuser executor to set these options.

Self-note: Follow-up with a design doc after preliminary discussion.

CC: @jacobhinkle @kevinstephano

crcrpar commented 3 weeks ago

Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval.

Would this option have an effect on both matmul and linear? If so, I would find it reasonable to unify the two kwarg nv_enable options in Thunder into one.

Priya2698 commented 3 weeks ago

Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval.

Would this option have an effect on both matmul and linear?

Yes, this impacts both linear and matmul since they use the same schedulers.

If so, I would find it reasonable to unify the two kwarg nv_enable options in Thunder into one.

That makes sense, we can have nv_enable_matmul and the default value can be False for matmul, and True for linear. Open to any better naming scheme. However, @IvanYashchuk noticed higher memory usage with matmul, but better peak memory for linear, so it may be worthwhile to be able to toggle these independently.

NVIDIA / Fuser

Enable codegen for matmul/linear in nvfuser through Thunder #3022