zlsh80826 commented 1 week ago

Description

This PR exposes underly THD format API to JAX side and wrap it to be supported on multi-gpus.

Fixes # (issue)

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Infra/Build change
[ ] Code refractor

Unify fused_attn_qkvpacked, fused_attn_kvpacked, fused_attn to be a single function
Add experimental fused_attn_thd API with the corresponding unit tests, this API requires users to configure a static max_segments_per_seq argument and ensure the "number of segments" in a seqeunce less and equal than this value
- workspace memory consumptions: ~batch_size max_sequence_length max_segments_per_seq * num_heads
Reduce test_fused_attn runtime
- Test only 1 FP16 shape
- Disable forward tests (backward tests include forward, we kept forward for the development and debugging).

zlsh80826 commented 1 week ago

/te-ci jax

zlsh80826 commented 1 week ago

/te-ci jax