axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.71k stars 851 forks source link

(LambdaLabs) Accelerate Finetune Command: 'cannot import name 'LRScheduler' from 'torch.optim.lr_scheduler' #237

Closed blevlabs closed 1 year ago

blevlabs commented 1 year ago

I am trying to run a finetuning script for an Alpaca-7B model, and getting the following:

ubuntu@129-213-146-81:~/axolotl$ accelerate launch scripts/finetune.py axolotl_conf.yml

/home/ubuntu/.local/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (2.0.3) or chardet (3.0.4)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
/home/ubuntu/.local/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (2.0.3) or chardet (3.0.4)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            129-213-146-81
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4126

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           129-213-146-81
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------
/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/lib/x86_64-linux-gnu/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Traceback (most recent call last):
  File "scripts/finetune.py", line 24, in <module>
    from axolotl.utils.trainer import setup_trainer
  File "/home/ubuntu/axolotl/src/axolotl/utils/trainer.py", line 23, in <module>
    from axolotl.utils.schedulers import InterpolatingLogScheduler
  File "/home/ubuntu/axolotl/src/axolotl/utils/schedulers.py", line 3, in <module>
    from torch.optim.lr_scheduler import LRScheduler
ImportError: cannot import name 'LRScheduler' from 'torch.optim.lr_scheduler' (/usr/lib/python3/dist-packages/torch/optim/lr_scheduler.py)
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 941, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 603, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'scripts/finetune.py', 'axolotl_conf.yml']' returned non-zero exit status 1.
winglian commented 1 year ago

Can you share your yml config file?

blevlabs commented 1 year ago

@winglian

base_model: Blevlabs/alpaca-7b
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
datasets:
  - path: data/searchQA.jsonl
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.04
adapter:
lora_model_dir:
sequence_len: 2048
lora_r:
lora_alpha:
lora_dropout:
lora_target_modules:
lora_fan_in_fan_out:
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./alpaca-search
batch_size: 4
micro_batch_size: 2
num_epochs: 3
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:

I am very new to this repo, so this may not be the best format for training an Alpaca7B model with axolotl. Advice on the config would be appreciated as well, trying to fine-tune the model with a set of 50k examples for Question-Answering

NanoCode012 commented 1 year ago

@blevlabs , may I ask if you have pip install following Readme? Would it be possible to create a new environment in py39 or try the same in the docker image?

blevlabs commented 1 year ago

@NanoCode012 Hello, yes I followed the LambdaLabs setup in the README exactly. I ensured it was in python 3.9, and still encountered the issue. I can try a docker instance to see if it helps

NanoCode012 commented 1 year ago

@blevlabs Hello, may I ask if you managed to solve this?