NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Sequence Parallel Forward Transformer #3338

Closed cowanmeg closed 2 days ago

cowanmeg commented 2 weeks ago

Sequence parallel forward transformer layer and multi-headed attention tests.

  1. Cleans up sharding annotations in Forward fusion definitions. Only sharding changes and inputs are explicitly sharded.
  2. Updates output of mha and mlp to be a struct with named TVs to make code more readable.
  3. Dropout probability is temporarily set to 0. This will be fixed in a later PR to use philox seed and offset with validation.
cowanmeg commented 6 days ago

!build

cowanmeg commented 2 days ago

!build

liqiangxl commented 2 days ago

check DistributedTransformerTest.MultiheadAttention_SP/__half !test

liqiangxl commented 2 days ago

!test