Does it works with cpu too?

andreae293 commented 1 year ago

Hi, has anyone ever tried to train with cpu? i know it will be super slow but im tried for the fun of it

i currently disabled my gpu by setting this line in image_train_stable.py torch.cuda.is_available = lambda : False

Traceback (most recent call last): File "scripts\image_train_stable.py", line 157, in main() File "scripts\image_train_stable.py", line 85, in main TrainLoop( File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 194, in run_loop self.run_step(batch, cond) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 208, in run_step self.forward_backward(batch, cond) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 236, in forward_backward losses = compute_losses() File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 96, in training_losses return super().training_losses(self._wrap_model(model), *args, kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\gaussian_diffusion.py", line 1137, in training_losses model_output = model(x_t, self._scale_timesteps(t), model_kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 133, in call return self.model(x, new_ts, kwargs) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 880, in forward h = module(h, emb, context) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 217, in forward x = layer(x) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

sorry for bothering with useless question but am i doing something wrong? thanks

edit: nevermind i removed both .half() from the image_train_stable.py and deleted --use_fp16 from the training arguments

this way i was able to train on cpu

timotheecour4 commented 1 year ago

@andreae293 can you please provide more details? I tried:

export CUDA_VISIBLE_DEVICES=-1
MODEL_FLAGS="--actual_image_size 512 --lr_warmup_steps 10000 --ema_rate 0.9999 --attention_resolutions 64,32,16 --class_cond False --diffusion_steps 1000 --image_size 64 --learn_sigma False --noise_schedule linear --num_channels 320 --num_heads 8 --num_res_blocks 2 --resblock_updown False --use_fp16 False --use_scale_shift_norm False "
TRAIN_FLAGS="--lr 5e-5 --batch_size 32 --log_interval 10 --save_interval 5000 --kl_model kl.pt --resume_checkpoint diffusion.pt"
export OPENAI_LOGDIR=./logs/
python scripts/image_train_stable.py --data_dir /path/to/image-and-text-files $MODEL_FLAGS $TRAIN_FLAGS

where I change the default from README from --use_fp16 True to --use_fp16 False (and IIUC no need to remove .half() from the image_train_stable.py with this flag), but it gives:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ie, still seems to use gpu instead of cpu

ghost commented 1 year ago

Stable Diffusion requires CUDA to run the AI, as it is the language for communicating with the GPU and preforming the necessary calculations. Using with the CPU would require a complete rewrite or virtualization which requires more RAM and money than it would take to go and buy a supported CUDA GPU. Although if anyone who is reading this is willing, would it be possible to utilize a TPU from Kaggle or Google Collab instead? I feel like it might be more efficient than a GPU or CPU, as it is meant for processing Tensors directly.

andreae293 commented 1 year ago

@timotheecour4 sorry for the late response if you dont have enough RAM you have to dedicate a lot of GB to the virtual memory (paging file or swap memory) i dont know the minimum required by this repo,but i did dedicate 100 GB of virtual memory also if you are trying to finetune i suggest you to look for dreambooth for stable diffusion

@TheRealUnBot stable diffusion does not necessarily requires CUDA-supporting hardware to run since its based on pytorch you can run just fine in CPU with the downside of being x50 slower

Jack000 / glid-3-xl-stable

Does it works with cpu too? #7