Open LoggerHead22 opened 1 year ago
Hi @LoggerHead22, this code appears to be a logic fault, thanks for noting.
We haven't tested the FA on MI100 since we did most of our testing on MI250&MI300 so we are limiting the support archs. I am not sure whether it will work correctly on MI100 but you can try by adding gfx908 to the valid archs. I suppose the building process will be fine.
Thanks for the clarification @howiejayz . Your advice really helped, the code is compiled for mi100 and runs.
However, I encountered an error during the build, which is caused by the logic of the patch.
hipified_header_filepath = HIPIFY_FINAL_RESULT[header_filepath].hipified_path
AttributeError: 'dict' object has no attribute 'hipified_path'
This seems logical, because a dict is being created here and then we try to take its _hipifiedpath attribute.
Replacing dict with an object of the HipifyResult class in patch helped me.
Has this patch been merged to the main branch or do we need to apply it in order to test?
I need mi100 support
Hi @LoggerHead22, this code appears to be a logic fault, thanks for noting.
We haven't tested the FA on MI100 since we did most of our testing on MI250&MI300 so we are limiting the support archs. I am not sure whether it will work correctly on MI100 but you can try by adding gfx908 to the valid archs. I suppose the building process will be fine.
If you need hardware for testing mi100, I volunteer my server for this purpose. I have 8x mi100 with infinity fabric.
ehartford@gmail.com
Hi @sabreshao @howiejayz can you please give me a path forward?
I have a bunch of mi100s and I would like them to be hot. Without flash attention, I am blocked.
Maybe you could show me where in the code I would add it? give me some advice?
Hi @LoggerHead22, this code appears to be a logic fault, thanks for noting.
We haven't tested the FA on MI100 since we did most of our testing on MI250&MI300 so we are limiting the support archs. I am not sure whether it will work correctly on MI100 but you can try by adding gfx908 to the valid archs. I suppose the building process will be fine.
Hi @ehartford! Currently I have no time to test FA on MI100 but could you try build and run based on this comment?
I was able to compile flash attention for the MI100 using the docker image. Simply adding gfx908 to the target arch array (or in my case, removing everything BUT native and gfx908) makes it run fine. (Note: this also applies to the vLLM ROCm docker image, which was my use case)
Attempts to compile outside of docker seem to fail on ROCm 6.0 due to this issue, though I was unable to downgrade back to 5.7 to test on my machine.
I managed to build MI100 (gfx908) as well but the env var didn't work @TNT3530 . This is because the setup is protected against unknown architectures and gfx908
is not listed. I will open a PR for adding that since gfx908
definitely works.
Here's my PR, you folks might benefit from it: https://github.com/ROCmSoftwarePlatform/flash-attention/pull/38
How do I install flash attention for mi100? How is the procedure from the README.md different?
@ehartford passing the card arch to the build should be enough: export GPU_ARCHS="gfx908"
Also curious if support for Mi100 was finalized.
This is awesome! Can't wait to try it!
just realized Mi100 support was removed
@jayz0123 was that intentional
I can confirm that when this is patched away again to allow mi100 to build the package, the latest main builds and works fine on gfx908 at least for the dimensions i tried. So this restriction seams pretty silly, and its quite puzzling why mi100 was removed from the array again given it still works fine.
Then - someone doesn't want it to work on mi100
I can confirm that when this is patched away again to allow mi100 to build the package, the latest main builds and works fine on gfx908 at least for the dimensions i tried. So this restriction seams pretty silly, and its quite puzzling why mi100 was removed from the array again given it still works fine.
Could you please make a PR that enables mi100 so I can test it?
I can confirm that when this is patched away again to allow mi100 to build the package, the latest main builds and works fine on gfx908 at least for the dimensions i tried. So this restriction seams pretty silly, and its quite puzzling why mi100 was removed from the array again given it still works fine.
Could you please make a PR that enables mi100 so I can test it?
pytest test_flash_attn_ck.py /usr/local/lib/python3.10/dist-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================================================================ test session starts ============================================================================================= platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 rootdir: /home/power/shared/code/flash-attention/tests configfile: pyproject.toml plugins: asyncio-0.24.0, anyio-4.6.2.post1, typeguard-4.3.0 asyncio: mode=strict, default_loop_scope=None collected 410996 items
test_flash_attn_ck.py ................................................................................................................................................................................ [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] ...................................................................................................................................................................................................... [ 0%] .............................................................................................................................Fatal Python error: Aborted
Thread 0x00007f15117fd640 (most recent call first):
Hi, the documentation says that this implementation is compatible only with the MI200 and MI300 GPUs. But what about the MI100 gpu?
The code contains such conditions that formally match the MI100 with the gfx908 architecture.
Will this code be compatible with MI100 in practice? If not, are there any plans to add such support? Or what are the reasons that keep you from adding support for the MI100?