NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

starinskycc commented 3 months ago

File "/root/data/x-flux/train_flux_lora_deepspeed.py", line 301, in main() File "/root/data/x-flux/train_flux_loradeepspeed.py", line 149, in main dit, optimizer, , lr_scheduler = accelerator.prepare( File "/opt/miniconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1292, in prepare result = tuple( File "/opt/miniconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1293, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "/opt/miniconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1169, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "/opt/miniconda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1412, in prepare_model model = model.to(self.device) File "/opt/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1174, in to return self._apply(convert) File "/opt/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply module._apply(fn) File "/opt/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 805, in _apply param_applied = fn(param) File "/opt/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1167, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. Traceback (most recent call last): File "/opt/miniconda/bin/accelerate", line 8, in sys.exit(main()) File "/opt/miniconda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/opt/miniconda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1082, in launch_command simple_launcher(args) File "/opt/miniconda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 688, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/opt/miniconda/bin/python', 'train_flux_lora_deepspeed.py', '--config', 'train_configs/test_lora.yaml']' returned non-zero exit status 1.

whiterm commented 3 months ago

It looks like you are trying to use the quantized model, which is not supported. You need to use the full version, not the quantized one.

starinskycc commented 3 months ago

Tks

XLabs-AI / x-flux

NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #75