SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
https://arxiv.org/pdf/2406.16858
Apache License 2.0
817 stars 81 forks source link

LLama 3.1 8b training issue. #129

Open dhananjaybhandiwad opened 2 months ago

dhananjaybhandiwad commented 2 months ago

Hello authors I am trying to train the draft head for LLama 3.1-8b-instruct, I have the following below error. Inspite of my best efforts of updating all the possible libraries. It still fails.

The following values were not passed to `accelerate launch` and had defaults used instead:
    `--num_processes` was set to a value of `1`
    `--num_machines` was set to a value of `1`
    `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Traceback (most recent call last):
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/main.py", line 72, in <module>
    baseconfig = AutoConfig.from_pretrained(args.basepath)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Traceback (most recent call last):
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/python', '-m', 'eagle.train.main', '--basepath', '/data/horse/ws/dhra414f-dhra414f/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16', '--tmpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/EAGLE/eagle/data_eagle/sharegpt_0_67999_mufp16', '--cpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/checkpoints', '--configpath', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/EAGLE-LLaMA3.1-Instruct-8B.json']' returned non-zero exit status 1.

Do you have a solution for this. Please let me know if I am doing something wrong.

Liyuhui-12 commented 1 month ago

It might be an issue with the version of the transformers package.

870572761 commented 1 month ago

In fact, even if you solve this problem, you will find there do not exist the key word about "lm_head.weight". So maybe this code is custom-made for some model. image