XavierXiao / Dreambooth-Stable-Diffusion

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
MIT License
7.61k stars 795 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)` #53

Open TemporalLabsLLC-SOL opened 2 years ago

TemporalLabsLLC-SOL commented 2 years ago

C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loggers\test_tube.py:104: LightningDeprecationWarning: The TestTubeLogger is deprecated since v1.5 and will be removed in v1.7. We recommend switching to thepytorch_lightning.loggers.TensorBoardLoggeras an alternative. rank_zero_deprecation( Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs\\SUBJECT2022-10-04T06-25-48_DSU90\\checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 1, 'every_n_train_steps': 500}} GPU available: True, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py:1584: UserWarning: GPU available but not used. Set the gpus flag in your trainerTrainer(gpus=1)or script--gpus=1`. rank_zero_warn(

Data

train, PersonalizedBase, 1500 reg, PersonalizedBase, 15000 validation, PersonalizedBase, 15 accumulate_grad_batches = 1 ++++ NOT USING LR SCALING ++++ Setting learning rate to 1.00e-06 C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:275: LightningDeprecationWarning: The on_keyboard_interrupt callback hook was deprecated in v1.5 and will be removed in v1.7. Please use the on_exception callback hook instead. rank_zero_deprecation( C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:284: LightningDeprecationWarning: Base LightningModule.on_train_batch_start hook signature has changed in v1.5. The dataloader_idx argument will be removed in v1.7. rank_zero_deprecation( C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py:291: LightningDeprecationWarning: Base Callback.on_train_batch_end hook signature has changed in v1.5. The dataloader_idx argument will be removed in v1.7. rank_zero_deprecation( C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\core\datamodule.py:469: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup. rank_zero_deprecation( LatentDiffusion: Also optimizing conditioner params! Project config model: base_learning_rate: 1.0e-06 target: ldm.models.diffusion.ddpm.LatentDiffusion params: reg_weight: 1.0 linear_start: 0.00085 linear_end: 0.012 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image cond_stage_key: caption image_size: 64 channels: 4 cond_stage_trainable: true conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: false embedding_reg_weight: 0.0 unfreeze_model: true model_lr: 1.0e-06 personalization_config: target: ldm.modules.embedding_manager.EmbeddingManager params: placeholder_strings:

Lightning config modelcheckpoint: params: every_n_train_steps: 500 callbacks: image_logger: target: main.ImageLogger params: batch_frequency: 200 max_images: 8 increase_log_steps: false trainer: benchmark: true max_steps: 800 gpus: 0

| Name | Type | Params

0 | model | DiffusionWrapper | 859 M 1 | first_stage_model | AutoencoderKL | 83.7 M 2 | cond_stage_model | FrozenCLIPEmbedder | 123 M

982 M Trainable params 83.7 M Non-trainable params 1.1 B Total params 4,264.941 Total estimated model params size (MB) Validation sanity check: 0it [00:00, ?it/s]C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\data_loading.py:132: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 8 which is the number of cpus on this machine) in theDataLoader` init to improve performance. rank_zero_warn( Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]Summoning checkpoint.

Traceback (most recent call last): File "main.py", line 838, in trainer.fit(model, data) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 740, in fit self._call_and_handle_interrupt( File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 685, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 777, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1199, in _run self._dispatch() File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1279, in _dispatch self.training_type_plugin.start_training(self) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 202, in start_training self._results = trainer.run_stage() File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1289, in run_stage return self._run_train() File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1311, in _run_train self._run_sanity_check(self.lightning_module) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1375, in _run_sanity_check self._evaluation_loop.run() File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run self.advance(*args, *kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 110, in advance dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\base.py", line 145, in run self.advance(args, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 122, in advance output = self._evaluation_step(batch, batch_idx, dataloader_idx) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 217, in _evaluation_step output = self.trainer.accelerator.validation_step(step_kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 236, in validation_step return self.training_type_plugin.validation_step(step_kwargs.values()) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 219, in validation_step return self.model.validation_step(args, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, *kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 368, in validationstep , loss_dict_no_ema = self.shared_step(batch) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 908, in shared_step loss = self(x, c) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 937, in forward c = self.get_learned_conditioning(c) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\models\diffusion\ddpm.py", line 595, in get_learned_conditioning c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 324, in encode return self(text, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 319, in forward z = self.transformer(input_ids=tokens, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 297, in transformer_forward return self.text_model( File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 258, in text_encoder_forward hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, embedding_manager=embedding_manager) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\Urban\Desktop\Dreambooth-SD-optimized-main\ldm\modules\encoders\modules.py", line 180, in embedding_forward inputs_embeds = self.token_embedding(input_ids) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\sparse.py", line 158, in forward return F.embedding( File "C:\Users\Urban\anaconda3\envs\ldm\lib\site-packages\torch\nn\functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)`

I'm in need of any perspective anybody can give on making this compatible for the right calls on a windows environment where WSL is not an option.

xzdong-2019 commented 1 year ago

i have the same problem, do you solve it?

add --gpus=1 , it works

howardgriffin commented 1 year ago

same problem

XinyangHan commented 1 year ago

same problem

TemporalLabsLLC-SOL commented 1 year ago

There are a couple known fixes depending on your specific env. I can compile some links later but use the search function too.

On Wed, Jan 11, 2023, 11:53 PM howardgriffin @.***> wrote:

same problem

— Reply to this email directly, view it on GitHub https://github.com/XavierXiao/Dreambooth-Stable-Diffusion/issues/53#issuecomment-1379887964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALL43Q4IJ3SA4RAKBINEEHLWR6S7BANCNFSM6AAAAAAQ4RJOIA . You are receiving this because you authored the thread.Message ID: @.***>

XinyangHan commented 1 year ago

@xzdong-2019, may I ask how did you solve that? I mean where should us add gpus=1?

Eun0 commented 1 year ago

@xzdong-2019, may I ask how did you solve that? I mean where should us add gpus=1?

python main.py --gpus 0, --prompt ....

it works to me