Closed Alchemy5 closed 1 year ago
I get the same error when trying to fine-tune RWKV/rwkv-4-7b-pile
. It seems the error is coming from deepspeed fp16 loss scaling. What should be the right fp16 loss scaling for rwkv4?
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/stage3.py", line 1923, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 62, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
TypeError: backward() takes 2 positional arguments but 3 were given
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 278) of binary: /opt/conda/bin/python3.8
my deepspeed config + pytorch lightning Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
strategy = DeepSpeedStrategy(
stage=3,
offload_optimizer=True,
offload_parameters=True,
)
trainer = Trainer(
default_root_dir=self.conf["job_dir"],
accelerator="gpu" if torch.cuda.is_available() else None,
devices=devices,
strategy=strategy,
precision=32 if strategy is None else "bf16",
max_epochs=self.conf.getint("epochs"),
callbacks=training_callbacks(patience=self.conf.getint("patience")),
deterministic=False,
logger=CSVLogger(save_dir=self.conf["job_dir"]),
)
fp16 training options - what should I change here?
"fp16": {
"enabled": true,
"auto_cast": false,
"loss_scale": 0,
"initial_scale_power": 16,
"loss_scale_window": 1000,
"hysteresis": 2,
"consecutive_hysteresis": false,
"min_loss_scale": 1
}
use v4neo deepspeed==0.7.0 pytorch-lightning==1.9.2 torch 1.13.1+cu117
When attempting finetuning with different rwkv models (e.g.,"RWKV/rwkv-raven-1b5") I keep running into the error that loss backward() takes 2 positional arguments but 3 were given when attempting to do a backward pass.