Getting RecursionError when using noisy_embedding_alpha in example/mistral/qlora.yml

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

The training should begin.

Current behaviour

accelerate launch -m axolotl.cli.train examples/mistral/qlora.yml 
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2023-11-02 16:09:38.417662: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-02 16:09:38.484621: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-02 16:09:38.487148: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-11-02 16:09:38.487158: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-11-02 16:09:38.500337: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-02 16:09:38.781474: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-11-02 16:09:38.781510: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2023-11-02 16:09:38.781514: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/home/mathias/.local/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
                              dP            dP   dP 
                              88            88   88 
   .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
   88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
   88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
   `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 

[2023-11-02 16:09:39,917] [WARNING] [axolotl.validate_config:169] [PID:105218] [RANK:0] eval_batch_size != micro_batch_size. This can lead to VRAM instability.
[2023-11-02 16:09:40,100] [INFO] [axolotl.normalize_config:128] [PID:105218] [RANK:0] GPU memory usage baseline: 0.000GB (+18.426GB misc)
[2023-11-02 16:09:40,100] [WARNING] [axolotl.scripts.check_user_token:268] [PID:105218] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets.
[2023-11-02 16:09:40,287] [DEBUG] [axolotl.load_tokenizer:96] [PID:105218] [RANK:0] EOS: 2 / </s>
[2023-11-02 16:09:40,287] [DEBUG] [axolotl.load_tokenizer:97] [PID:105218] [RANK:0] BOS: 1 / <s>
[2023-11-02 16:09:40,287] [DEBUG] [axolotl.load_tokenizer:98] [PID:105218] [RANK:0] PAD: 2 / </s>
[2023-11-02 16:09:40,287] [DEBUG] [axolotl.load_tokenizer:99] [PID:105218] [RANK:0] UNK: 0 / <unk>
[2023-11-02 16:09:40,287] [INFO] [axolotl.load_tokenized_prepared_datasets:133] [PID:105218] [RANK:0] Unable to find prepared dataset in last_run_prepared/79fe5144e8e385dc65045e15b51b2838
[2023-11-02 16:09:40,287] [INFO] [axolotl.load_tokenized_prepared_datasets:134] [PID:105218] [RANK:0] Loading raw datasets...
[2023-11-02 16:09:40,287] [INFO] [axolotl.load_tokenized_prepared_datasets:139] [PID:105218] [RANK:0] No seed provided, using default seed of 42
Map (num_proc=24): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 6047.53 examples/s]
[2023-11-02 16:09:45,058] [INFO] [axolotl.load_tokenized_prepared_datasets:281] [PID:105218] [RANK:0] merging datasets
[2023-11-02 16:09:45,060] [INFO] [axolotl.load_tokenized_prepared_datasets:288] [PID:105218] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/79fe5144e8e385dc65045e15b51b2838
Saving the dataset (1/1 shards): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 285404.46 examples/s]
Filter (num_proc=24): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1980/1980 [00:00<00:00, 15294.03 examples/s]
Filter (num_proc=20): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 189.24 examples/s]
Map (num_proc=24): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1980/1980 [00:00<00:00, 13560.94 examples/s]
Map (num_proc=20): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 164.15 examples/s]
[2023-11-02 16:09:46,088] [INFO] [axolotl.calculate_total_num_steps:156] [PID:105218] [RANK:0] calculating total_num_tokens
[2023-11-02 16:09:46,090] [INFO] [axolotl.calculate_total_num_steps:163] [PID:105218] [RANK:0] total_num_tokens: 426849
[2023-11-02 16:09:46,098] [INFO] [axolotl.calculate_total_num_steps:173] [PID:105218] [RANK:0] `total_supervised_tokens: 294561`
[2023-11-02 16:09:46,100] [INFO] [axolotl.utils.dataloader.generate_batches:225] [PID:105218] [RANK:0] generating packed batches
[2023-11-02 16:09:46,101] [INFO] [axolotl.utils.dataloader.generate_batches:231] [PID:105218] [RANK:0] 04eb73112c686fd33f79315115335175d7e6f9ed53cb34af6f8ff4b46d340184
[2023-11-02 16:09:48,416] [INFO] [axolotl.utils.dataloader.len_w_stats:335] [PID:105218] [RANK:0] packing_efficiency_estimate: 1.0 actual packing efficiency: 0.9649183485243056
[2023-11-02 16:09:48,416] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 426849
[2023-11-02 16:09:48,416] [INFO] [axolotl.calculate_total_num_steps:223] [PID:105218] [RANK:0] data_loader_len: 24
[2023-11-02 16:09:48,416] [INFO] [axolotl.calc_sample_packing_eff_est:229] [PID:105218] [RANK:0] sample_packing_eff_est across ranks: [0.9649183485243056]
[2023-11-02 16:09:48,416] [INFO] [axolotl.calculate_total_num_steps:240] [PID:105218] [RANK:0] sample_packing_eff_est: 0.97
[2023-11-02 16:09:48,416] [INFO] [axolotl.calculate_total_num_steps:245] [PID:105218] [RANK:0] total_num_steps: 24
[2023-11-02 16:09:48,419] [INFO] [axolotl.train.train:47] [PID:105218] [RANK:0] loading tokenizer... mistralai/Mistral-7B-v0.1
[2023-11-02 16:09:48,606] [DEBUG] [axolotl.load_tokenizer:96] [PID:105218] [RANK:0] EOS: 2 / </s>
[2023-11-02 16:09:48,606] [DEBUG] [axolotl.load_tokenizer:97] [PID:105218] [RANK:0] BOS: 1 / <s>
[2023-11-02 16:09:48,606] [DEBUG] [axolotl.load_tokenizer:98] [PID:105218] [RANK:0] PAD: 2 / </s>
[2023-11-02 16:09:48,606] [DEBUG] [axolotl.load_tokenizer:99] [PID:105218] [RANK:0] UNK: 0 / <unk>
[2023-11-02 16:09:48,606] [INFO] [axolotl.train.train:55] [PID:105218] [RANK:0] loading model and (optionally) peft_config...
[2023-11-02 16:09:48,719] [INFO] [axolotl.load_model:180] [PID:105218] [RANK:0] patching with flash attention
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.78s/it]
[2023-11-02 16:10:01,630] [INFO] [axolotl.load_model:404] [PID:105218] [RANK:0] GPU memory usage after model load: 4.349GB (+0.154GB cache, +18.718GB misc)
[2023-11-02 16:10:01,645] [INFO] [axolotl.load_model:421] [PID:105218] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2023-11-02 16:10:01,646] [INFO] [axolotl.load_model:432] [PID:105218] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2023-11-02 16:10:01,648] [INFO] [axolotl.load_lora:541] [PID:105218] [RANK:0] found linear modules: ['o_proj', 'q_proj', 'down_proj', 'gate_proj', 'v_proj', 'up_proj', 'k_proj']
trainable params: 83,886,080 || all params: 7,325,618,176 || trainable%: 1.1451058188485088
[2023-11-02 16:10:22,854] [INFO] [axolotl.load_model:468] [PID:105218] [RANK:0] GPU memory usage after adapters: 4.679GB (+0.218GB cache, +18.718GB misc)
[2023-11-02 16:10:22,858] [INFO] [axolotl.train.train:83] [PID:105218] [RANK:0] Pre-saving adapter config to ./qlora-out
[2023-11-02 16:10:22,859] [INFO] [axolotl.train.train:107] [PID:105218] [RANK:0] Starting trainer...
[2023-11-02 16:10:22,978] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 426849
[2023-11-02 16:10:22,978] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 426849
  0%|                                                                                                                                                                                                                                                                                                                                                                     | 0/6 [00:00<?, ?it/s][2023-11-02 16:10:23,003] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 426849
[2023-11-02 16:10:23,004] [INFO] [axolotl.utils.dataloader.generate_batches:225] [PID:105218] [RANK:0] generating packed batches
[2023-11-02 16:10:23,004] [INFO] [axolotl.utils.dataloader.generate_batches:231] [PID:105218] [RANK:0] 95be00c870cb4642e0ccbd683d94d7f48db802945935b4c592d35b9325f3cb70
[2023-11-02 16:10:23,005] [INFO] [axolotl.utils.dataloader.len_w_stats:335] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 actual packing efficiency: 0.9649183485243056
[2023-11-02 16:10:23,005] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 426849
[2023-11-02 16:10:23,005] [INFO] [axolotl.utils.dataloader._worker:192] [PID:105218] [RANK:0] [WORKER] Epochs: 1, Samples: 50
[2023-11-02 16:10:23,005] [INFO] [axolotl.utils.dataloader.generate_batches:225] [PID:105218] [RANK:0] generating packed batches
[2023-11-02 16:10:23,005] [INFO] [axolotl.utils.dataloader.generate_batches:231] [PID:105218] [RANK:0] 108564450b4c7e49268b3732f8d4d1d60e9c4d2dd52d1dcb3b4d078e1632f9ba
[2023-11-02 16:10:23,006] [INFO] [axolotl.utils.dataloader._len_est:304] [PID:105218] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 426849
Traceback (most recent call last):
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  [Previous line repeated 988 more times]
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 159, in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 290, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 151, in send_to_device
    return honor_type(
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 83, in honor_type
    return type(obj)(generator)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 152, in <genexpr>
    tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/utils/operations.py", line 154, in send_to_device
    elif isinstance(tensor, Mapping):
  File "/usr/lib/python3.10/typing.py", line 994, in __instancecheck__
    return self.__subclasscheck__(type(obj))
  File "/usr/lib/python3.10/typing.py", line 1158, in __subclasscheck__
    return issubclass(cls, self.__origin__)
  File "/usr/lib/python3.10/abc.py", line 123, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
RecursionError: maximum recursion depth exceeded in comparison
  0%|                                                                                                                                                                                                                                                                                                                                                                     | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/mathias/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 994, in launch_command
    simple_launcher(args)
  File "/home/mathias/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'examples/mistral/qlora.yml']' returned non-zero exit status 1.

Steps to reproduce

Add noisy_embedding_alpha: 5 to examples/mistral/qlora.yml.
Run accelerate launch -m axolotl.cli.train examples/mistral/qlora.yml.

Config yaml

base_model: mistralai/Mistral-7B-v0.1 model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true

load_in_8bit: false load_in_4bit: true strict: false

datasets:

path: mhenrichsen/alpaca_2k_test type: alpaca dataset_prepared_path: last_run_prepared val_set_size: 0.01 output_dir: ./qlora-out

adapter: qlora lora_model_dir:

sequence_len: 8192 sample_packing: true pad_to_sequence_len: true

lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: lora_target_modules:

gate_proj
down_proj
up_proj
q_proj
v_proj
k_proj
o_proj

wandb_project: wandb_entity: wandb_watch: wandb_run_id: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 eval_steps: 0.05 eval_table_size: eval_table_max_new_tokens: 128 save_steps: debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "~~" eos_token: "~~" unk_token: ""

noisy_embedding_alpha: 5

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

axolotl-ai-cloud / axolotl