StudyingShao commented 2 weeks ago

Description

Permutation for fp32/bf16/fp16/fp8 data type. Now PyTorch op only.

Additional descriptions: https://github.com/fanshiqing/moe_grouped_gemm/tree/dev

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Infra/Build change
[ ] Code refractor

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

[ ] I have read and followed the contributing guidelines
[ ] The functionality is complete
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] New and existing unit tests pass locally with my changes

StudyingShao commented 1 week ago

Hi @phu0ngng @cyanguwa , this PR is the Permutation fusion operators needed by MoE. Please ignore the unit test file tests/pytorch/test_permutation.py, and help to review other changes. Thanks. I will start to refactor the unit test file in parallel.

cc @QiZhangNV

phu0ngng commented 4 days ago

/te-ci pytorch

phu0ngng commented 2 days ago

Hi @StudyingShao, thanks for putting this work into TE. I have a couple of suggestions after the first glance at your code.

Please sign off all of your commits (DCO failed).
Please rewrite the unit test with pytest and enable skipping if FP8 is unavailable (see https://github.com/NVIDIA/TransformerEngine/blob/7326af9d8d7f7a9d2a4d24b0193d5bb51541a80d/tests/pytorch/test_numerics.py#L495).

NVIDIA / TransformerEngine

[MoE][Common/PyTorch] Add permutation #936

Description

Type of change

Changes

Checklist: