Open vgoklani opened 11 months ago
I've tried installing flash-attn using pip install flash-attn==2.2.1 and flash-attn==2.3. It can be seen that the installation was ultimately successful. However, when I attempt distributed training with Megatron LM, I consistently encounter the following issue :
Additionally, when I tried building from the source code, the issue persisted.
dropout_layer_norm is a separate extension. You don't have to use it.
@tridao to be clear, we want to use it :) but it's not building correctly
from above
&& cd csrc/layer_norm && pip install . && cd ../../ \
same here. compile failes right after the obj files are generated, or not all of them are generated , i don't know
As mentioned in https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm: As of 2024-01-05, this extension is no longer used in the FlashAttention repo. We've instead switched to a Triton-based implementation.
As mentioned in https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm: As of 2024-01-05, this extension is no longer used in the FlashAttention repo. We've instead switched to a Triton-based implementation.
thanks for replying. Does that mean that a model developer has to modify their usage in a model of flash_attn to use triton one? Or flash_attn will switch it internally by itself?
Internally we already use the Triton implementation for layernorm.
I'm using Qwen LLM in modelscope frame. Have flash_attn 2.5.6 installed.
so might I just ignore the warnings above when modelscope is loading model?
Sorry I can't control what Qwen implementation uses.
Sorry I can't control what Qwen implementation uses.
That's true. Yet if flash_attn use triton layernorm internally, there should not be such a warnings? They are just calling layernorm, whether it's the triton one or old one? Or the triton one are not simple replacement actually.
The warning is printed from Qwen's code. I can't control that.
have you solve this problem?i also meet this
Hey there,
I'm not able to build the
dropout-layer-norm
.I used this Docker image:
nvcr.io/nvidia/pytorch:23.09-py3
and then installed the flash-attention components via:this is a subset of the traceback:
The other modules all built successfully.
these are my device specs:
I was able to build everything with
flash_attn_version=2.2.1
without any issues.thanks!
===========
one quick update: I checked this whl:
https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.0/flash_attn-2.3.0+cu122torch2.1cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
and it looks like it didn't build correctly there either:
the other modules all imported correctly: