NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Enable codegen for matmul/linear in nvfuser through Thunder #3022

Open Priya2698 opened 2 months ago

Priya2698 commented 2 months ago

This task will involve development within nvFuser and Thunder.

Current status: In Thunder, we have the compile flags nvf_enable_matmul and nvf_enable_linear to allow nvfuser to optionally accept matmul/linear operators. By default, these are executed through ATen via our ExprEvalScheduler.

Next steps: Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval. For experimentation directly through Thunder, we want to allow setting these options from within Thunder.

Related efforts:

  1. https://github.com/NVIDIA/Fuser/pull/1905
  1. https://github.com/NVIDIA/Fuser/pull/2077

In both these cases, within Thunder, we can add the ability to pass additional flags to the nvfuser executor to set these options.

Self-note: Follow-up with a design doc after preliminary discussion.

CC: @jacobhinkle @kevinstephano

crcrpar commented 3 weeks ago

Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval.

Would this option have an effect on both matmul and linear? If so, I would find it reasonable to unify the two kwarg nv_enable options in Thunder into one.

Priya2698 commented 3 weeks ago

Within nvfuser, we have the ability to fuse matmul/linear operations in certain cases using NVF_ENABLE=fuse_matmul and disable ATen evaluation through NVF_DISABLE=matmul_expr_eval.

Would this option have an effect on both matmul and linear?

Yes, this impacts both linear and matmul since they use the same schedulers.

If so, I would find it reasonable to unify the two kwarg nv_enable options in Thunder into one.

That makes sense, we can have nv_enable_matmul and the default value can be False for matmul, and True for linear. Open to any better naming scheme. However, @IvanYashchuk noticed higher memory usage with matmul, but better peak memory for linear, so it may be worthwhile to be able to toggle these independently.