axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.92k stars 872 forks source link

`AttributeError: 'function' object has no attribute '__func__'` #195

Closed NanoCode012 closed 1 year ago

NanoCode012 commented 1 year ago

Edit: See here for one solution: https://github.com/OpenAccess-AI-Collective/axolotl/issues/195#issuecomment-1603189889


I'm noticing crash in latest git commit.

safe: 01248253a3e8aedba6d473469dc839cd368bfe3c crash: f31a338cbbdcd76a5af35e400eb9e0e8cae36b72

Command: accelerate launch scripts/finetune.py examples/openllama-3b/config.yml (point to proper config depend on commit)

INFO:root:Starting trainer...
Traceback (most recent call last):
  File "/workspace/scripts/finetune.py", line 321, in <module>
    fire.Fire(train)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/scripts/finetune.py", line 308, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1643, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1750, in _inner_training_loop
    model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1182, in prepare
    result = tuple(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/accelerator.py", line 1308, in prepare_model
    model.forward = MethodType(torch.cuda.amp.autocast(dtype=torch.float16)(model.forward.__func__), model)
AttributeError: 'function' object has no attribute '__func__'
NanoCode012 commented 1 year ago

This is due to accelerate config setting to float16 or bloat16. If you match accelerate config's precision with yaml, the error will be solved.

anshsarkar commented 1 year ago

@NanoCode012 I am facing the same issue when using accelerate library. Can you provide more details on how you are solving. Will help greatly!

NanoCode012 commented 1 year ago

Sure @anshsarkar .

I was testing a config here https://github.com/OpenAccess-AI-Collective/axolotl/blob/2ba4ae8f461c0c491f9ca303c134f9ad6f725e8c/examples/openllama-3b/config.yml on a machine where accelerate config precision is set to bf16 or fp16. This would cause a mismatch. I simply just changed the config to use None instead and it worked.

Vice versa, you can change the config.yml to match your accelerate's config (Recommended).

Are you using axolotl or just accelerate in general?

anshsarkar commented 1 year ago

@NanoCode012 I am using accelerate in general Thanks for the input. I will try this and see.

NanoCode012 commented 1 year ago

You can check your code where you cast to a type. Make sure it matches your accelerate's config. @anshsarkar

anshsarkar commented 1 year ago
from accelerate import Accelerator
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training
from accelerate.utils import DistributedType
model_id = "ausboss/llama-30b-supercot"
kwargs = DistributedType("NO")
accelerator = Accelerator(device_placement=False, mixed_precision= "fp16" , cpu=False)
model = LlamaForCausalLM.from_pretrained(model_id, device_map=device_map_lm, load_in_8bit=True, torch_dtype=torch.float16)
model = prepare_model_for_int8_training(model)
model = accelerator.prepare(model)
training_arguments = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    warmup_steps=100,
    max_steps=20,
    learning_rate=LEARNING_RATE,
    fp16=True,
    logging_steps=1,
    optim="adamw_torch",
    evaluation_strategy="steps",
    save_strategy="steps",
    eval_steps=1,
    save_steps=1,
    output_dir=OUTPUT_DIR,
    save_total_limit=3,
    load_best_model_at_end=True,
    report_to="tensorboard",
    ddp_find_unused_parameters=False,
#   deepspeed = deepspeed_config
)
data_collator = transformers.DataCollatorForSeq2Seq(
    tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
)
trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_arguments,
    data_collator=data_collator,

)
model.config.use_cache = False
old_state_dict = model.state_dict
model.state_dict = (
    lambda self, *_, **__: get_peft_model_state_dict(
        self, old_state_dict()
    )
).__get__(model, type(model))

model = torch.compile(model)

trainer.train()
model.save_pretrained(OUTPUT_DIR)

@NanoCode012 for the above i am getting the error you encountered. Haven't been able to solve it yet. Your input will be really helpful. I am getting this error when I am trying to run trainer.train()

Also thanks in advance!

NanoCode012 commented 1 year ago
accelerator = Accelerator(device_placement=False, mixed_precision= "fp16" , cpu=False)

Train the above line to the proper precision that you set when you run accelerate config

Luxios22 commented 1 year ago

Hi, I face same problem here, my code to create Accelerator is the following: accelerator = Accelerator()

My yaml look like this:

hyperparameters:
  dataloader_drop_last: True
  evaluation_strategy: "epoch"
  save_strategy: "epoch"
  logging_strategy: "epoch"
  num_train_epochs: 10
  auto_find_batch_size: True
  batch_size: 4
  max_steps: 1000
  eval_steps: 100
  save_steps: 1000
  logging_steps: 100
  per_device_train_batch_size: 8
  per_device_eval_batch_size: 8
  learning_rate: 1e-5
  lr_scheduler_type: "cosine"
  warmup_steps: 2000
  gradient_accumulation_steps: 1
  gradient_checkpointing: True
  sharded_ddp: False
  fsdp: False
  weight_decay: 0.0001
  run_name: "CodeT5-seq2seq-fine-tuned"
  ddp_find_unused_parameters: False
  fp16: True
  bf16: False
  auto_find_batch: True
  num_workers: 4
  max_prediction_length: 512
  beam_size: 5
  max_grad_norm: 5.0
  adam_epsilon : 1e-06

Even I remove the fp16, bf16 settings or set them to False, the error still exists. What should I do to make it work?

NanoCode012 commented 1 year ago

@Luxios22 , did you run accelerate config to make sure it matches with this yaml?

anshsarkar commented 1 year ago

@NanoCode012 I tried matching it with the yaml and setting to None as well. Still getting the same error

NanoCode012 commented 1 year ago

I'm not sure then, sorry. That was what fixed for me. Make sure to try reinstall latest version of the packages as well.

anshsarkar commented 1 year ago

Hmmmm, sure. Thanks for the help!!

StevenSong commented 1 year ago

@anshsarkar @Luxios22 I managed to get my script working by downgrading to transformers==4.29.2, seems like there were some changes from v4.30.0 onwards that introduced this issue. I opened an issue on the HF/transformers repo if you want to track it https://github.com/huggingface/transformers/issues/24431

edit: actually, that downgrade probably won't work for you guys if you're manually creating the accelerator object...so it seems like maybe it's a problem with accelerate instead. regardless, hopefully the HF guys will look into it

enn-nafnlaus commented 1 year ago

@NanoCode012 This issue is not completed and should not be closed until at the very least there is an informative error message given.

(And for whatever it's worth, I'm still struggling with this)

NanoCode012 commented 1 year ago

Hey @enn-nafnlaus , did you try downgrade following Steven's advice? That worked for me on newer machines. Could you list your steps to reproduce? axolotl or not etc?

enn-nafnlaus commented 1 year ago

Hey @enn-nafnlaus , did you try downgrade following Steven's advice? That worked for me on newer machines. Could you list your steps to reproduce? axolotl or not etc?

Just got to that advice (was working my way down this thread) and it worked (might want to have that be the error message :) )... well, to the degree that it eliminated the 8th consecutive error in trying to get it to run. But now off to the 9th, which is three separate consecutive errors (that happen regardless of whether my yaml has load_in_4bit enabled):

ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named weight. ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named weight. TypeError: LlamaForCausalLM.init() got an unexpected keyword argument 'load_in_4bit'

Surely unrelated to the func error though.

NanoCode012 commented 1 year ago

Yes @enn-nafnlaus . This is the caveat of this method. 4bit wasn't implemented in this version.. If you're not using it, you can simply comment out the lines. Otherwise, I'm not sure how else.. It seems to be more of a general issue than axolotl's as others above receive it despite not using axolotl.

enn-nafnlaus commented 1 year ago

Yes @enn-nafnlaus . This is the caveat of this method. 4bit wasn't implemented in this version.. If you're not using it, you can simply comment out the lines. Otherwise, I'm not sure how else.. It seems to be more of a general issue than axolotl's as others above receive it despite not using axolotl.

The problem is, I don't even know "what I need" as far as configs go. Here's my goals:

So I'm trying to follow the example and the "guide" on the axolotl project page, but there's tons of parameters and config options, and everything seems to lead down a "that's broken, with an obscure error message" road. :(

So should I be commenting out some line of code? Which lines of code? There's a whole stacktrace:

Traceback (most recent call last): File "/home/username/axolotl/src/axolotl/utils/models.py", line 194, in loadmodel model, = load_llama_model_4bit_low_ram( File "/home/username/.local/lib/python3.10/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram model = accelerate.load_checkpoint_and_dispatch( File "/home/username/.local/lib/python3.10/site-packages/accelerate/big_modeling.py", line 486, in load_checkpoint_and_dispatch load_checkpoint_in_model( File "/home/username/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1116, in load_checkpoint_in_model set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype) File "/home/username/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.") ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named weight. Traceback (most recent call last): File "/home/username/axolotl/src/axolotl/utils/models.py", line 194, in loadmodel model, = load_llama_model_4bit_low_ram( File "/home/username/.local/lib/python3.10/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram model = accelerate.load_checkpoint_and_dispatch( File "/home/username/.local/lib/python3.10/site-packages/accelerate/big_modeling.py", line 486, in load_checkpoint_and_dispatch load_checkpoint_in_model( File "/home/username/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 1116, in load_checkpoint_in_model set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype) File "/home/username/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.") ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named weight.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/username/axolotl/scripts/finetune.py", line 352, in fire.Fire(train) File "/home/username/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/username/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/username/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, *kwargs) File "/home/username/axolotl/scripts/finetune.py", line 251, in train model, peft_config = load_model( File "/home/username/axolotl/src/axolotl/utils/models.py", line 290, in load_model model = AutoModelForCausalLM.from_pretrained( File "/home/username/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained return model_class.from_pretrained( File "/home/username/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2611, in from_pretrained model = cls(config, model_args, **model_kwargs) TypeError: LlamaForCausalLM.init() got an unexpected keyword argument 'load_in_4bit' Traceback (most recent call last): File "/home/username/.local/bin/accelerate", line 8, in sys.exit(main()) File "/home/username/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/username/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 941, in launch_command simple_launcher(args) File "/home/username/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', 'scripts/finetune.py', 'summarize.yaml']' returned non-zero exit status 1.

enn-nafnlaus commented 1 year ago

But this is getting off topic - I'll start a new thread :) The only thing that applies to this thread is that a more helpful error message is needed.

New thread. https://github.com/OpenAccess-AI-Collective/axolotl/issues/259

enn-nafnlaus commented 1 year ago

So I've been doing more testing, and regardless of peft / gptq install status, doing the transformers==4.29.2 downgrade always leads directly into:

_TypeError: LlamaForCausalLM.init() got an unexpected keyword argument 'load_in4bit'

Indeed, while I can train the example lora yml (not the example model yml) if I do the right combination of install steps (no deepspeed, no torch compiling, no low-bit floating point config, and the peft-pull rather than the gptq requirements install), once I do the downgrade, neither loras nor models can be trained - both hit the above error.

NanoCode012 commented 1 year ago

TypeError: LlamaForCausalLM.init() got an unexpected keyword argument 'load_in_4bit'

Hello @enn-nafnlaus , sorry for late reply. I forgot to mention which lines. It's this mainly this line here as axolotl has been written for newer transformers version so this hack is needed unfortunately. https://github.com/OpenAccess-AI-Collective/axolotl/blob/b9b7d4ce9292739d7bd3b6113e54786f45db7462/src/axolotl/utils/models.py#L213

Do note: commenting this out would not allow QLORA. Make sure that when you do pip install to use -e as stated in docs.

Edit: Feel free to discuss this in other thread which seems to be more appropriate than this.

NanoCode012 commented 1 year ago

I came across this again when I needed to use newer version of transformers. The thing that fixed it now is using latest accelerate.

transformers==4.31.0
accelerate==0.21.0