Getting flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so error

sigridjineth commented 6 months ago

Traceback (most recent call last):
  File "/workspace/FuseLLM/FuseLLM/src/train.py", line 136, in <module>
    train()
  File "/workspace/FuseLLM/FuseLLM/src/train.py", line 39, in train
    tokenizer, model = load_tokenizer_and_model(args)
  File "/workspace/FuseLLM/FuseLLM/src/utils/common.py", line 47, in load_tokenizer_and_model
    model = get_base_model(args, trust_remote_code=kwargs["model_trust_remote_code"])
  File "/workspace/FuseLLM/FuseLLM/src/utils/others.py", line 88, in get_base_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 381, in _get_model_class
    supported_models = model_mapping[type(config)]
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 732, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 746, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 690, in getattribute_from_module
    if hasattr(module, attr):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1380, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1392, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE

I am consistently getting flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so error when running deepspeed.

18907305772 commented 6 months ago

Hello @sigridjineth, in the FuseLLM project, please try using flash_attn version "flash_attn==0.2.8". Or you can change the code to use flash_attn_2.

    trust_remote_code = False
    tknz_trust_remote_code = False
    use_fast = False

    # Set RoPE scaling factor
    config = transformers.AutoConfig.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        trust_remote_code=trust_remote_code
    )
    orig_ctx_len = getattr(config, "max_position_embeddings", None)
    if orig_ctx_len and training_args.model_max_length > orig_ctx_len:
        scaling_factor = float(math.ceil(training_args.model_max_length / orig_ctx_len))
        config.rope_scaling = {"type": "linear", "factor": scaling_factor}
    config.use_cache = False

    compute_dtype = (
        torch.bfloat16
        if training_args.bf16
        else (torch.float16 if training_args.fp16 else torch.float32)
    )

    # Load model and tokenizer
    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        config=config,
        cache_dir=training_args.cache_dir,
        use_flash_attention_2=True if training_args.flash_attn_transformers else False,
        torch_dtype=compute_dtype,
        trust_remote_code=trust_remote_code
    )
    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        model_max_length=training_args.model_max_length,
        padding_side="right",
        trust_remote_code=tknz_trust_remote_code,
        use_fast=use_fast,
    )
    tokenizer.pad_token = tokenizer.unk_token

sigridjineth commented 6 months ago

I found that Get Error:use_cache is not supported happens when training, so that turning off Flash Attention works for me. How can I workaround this? Have you encountered this issue?

18907305772 commented 6 months ago

You can follow the code and set config.use_cache = False to solve this error.

18907305772 / FuseAI

Getting flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so error #7