txt2img.py always get killed

HugoVG commented 2 years ago

(ldm) hugo@DESKTOP:/mnt/d/stable-diffusion$ python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --ckpt models/ldm/text2img256/model.ckpt
Global seed set to 42
Loading model from models/ldm/text2img256/model.ckpt
Global Step: 947666
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Killed

killed comes from txt2img.py -> Line34: full trace (with some debug point)

Global seed set to 42
<<CONFIG{Not changed}>>
{'model': {'base_learning_rate': 0.0001, 'target': 'ldm.models.diffusion.ddpm.LatentDiffusion', 'params': {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'jpg', 'cond_stage_key': 'txt', 'image_size': 64, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'monitor': 'val/loss_simple_ema', 'scale_factor': 0.18215, 'use_ema': False, 'scheduler_config': {'target': 'ldm.lr_scheduler.LambdaLinearScheduler', 'params': {'warm_up_steps': [10000], 'cycle_lengths': [10000000000000], 'f_start': [1e-06], 'f_max': [1.0], 'f_min': [1.0]}}, 'unet_config': {'target': 'ldm.modules.diffusionmodules.openaimodel.UNetModel', 'params': {'image_size': 32, 'in_channels': 4, 'out_channels': 4, 'model_channels': 320, 'attention_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4, 4], 'num_heads': 8, 'use_spatial_transformer': True, 'transformer_depth': 1, 'context_dim': 768, 'use_checkpoint': True, 'legacy': False}}, 'first_stage_config': {'target': 'ldm.models.autoencoder.AutoencoderKL', 'params': {'embed_dim': 4, 'monitor': 'val/rec_loss', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}, 'lossconfig': {'target': 'torch.nn.Identity'}}}, 'cond_stage_config': {'target': 'ldm.modules.encoders.modules.FrozenCLIPEmbedder'}}}}
Loading model from models/ldm/text2img256/model.ckpt
Global Step: 947666
<class 'ldm.models.diffusion.ddpm.LatentDiffusion'>
LatentDiffusion: Running in eps-prediction mode
Hit: get_obj_from_str
<class 'ldm.modules.diffusionmodules.openaimodel.UNetModel'>
DiffusionWrapper has 859.52 M params.
Hit: get_obj_from_str
<class 'ldm.models.autoencoder.AutoencoderKL'>
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Hit: get_obj_from_str
<class 'torch.nn.modules.linear.Identity'>
Hit: get_obj_from_str
<class 'ldm.modules.encoders.modules.FrozenCLIPEmbedder'>
Killed

Anyone know why it keeps killing itself? i downloaded all models, setup Conda env etc

lstein commented 2 years ago

Insufficient vmem or possibly memory ulimit is set too low.

illtellyoulater commented 2 years ago

It's failing silently for me too:

(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion> python.exe .\scripts\dream.py
* Initializing, be patient...
* Initialization done! Awaiting your command (-h for help, q to quit)...
dream> a photograph of an astronaut riding a horse
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 440000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion>

Running it this way instead makes it talk a bit more:

(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion> python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 440000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "scripts/txt2img.py", line 279, in <module>
    main()
  File "scripts/txt2img.py", line 188, in main
    model = load_model_from_config(config, f"{opt.ckpt}")
  File "scripts/txt2img.py", line 31, in load_model_from_config
    model = instantiate_from_config(config.model)
  File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 78, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\models\diffusion\ddpm.py", line 461, in __init__
    self.instantiate_cond_stage(cond_stage_config)
  File "c:\users\ny-name\downloads\generative-nn-models\latent-diffusion\ldm\models\diffusion\ddpm.py", line 519, in instantiate_cond_stage
    model = instantiate_from_config(config)
  File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 78, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "c:\users\ny-namef\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 86, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
AttributeError: module 'ldm.modules.encoders.modules' has no attribute 'FrozenCLIPEmbedder'
(ldm) PS C:\Users\ny_name\Downloads\generative-nn-models\stable-diffusion>

illtellyoulater commented 2 years ago

memory ulimit is set to low

where do you set that?

lstein commented 2 years ago

Did you get a message that ulimit is set too low? As far as I'm aware, this command is a Linux thing. There's something equivalent called the Windows System Resource Manager, which might be what you're looking for: https://serverfault.com/questions/133122/ulimit-for-windows

An alternative is to try Basu Jindal's fork, in which he has aggressively optimized memory usage and claims that it can generate 512x512 images in under 4 GB of memory: https://github.com/basujindal/stable-diffusion

Please let me know how you manage to solve the problem.

On Mon, Aug 22, 2022 at 12:31 AM illtellyoulater @.***> wrote:

memory ulimit is set to low

where do you set that?

— Reply to this email directly, view it on GitHub https://github.com/CompVis/stable-diffusion/issues/21#issuecomment-1221801278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVM636OKX7H6ZNMIVN3V2L7AJANCNFSM56THTQ5Q . You are receiving this because you commented.Message ID: @.***>

--

Lincoln Stein

Head, Adaptive Oncology, OICR

Senior Principal Investigator, OICR

Professor, Department of Molecular Genetics, University of Toronto

Tel: 416-673-8514

Cell: 416-817-8240

@.***

*E*xecutive Assistant

Michelle Xin

Tel: 647-260-7927

@. @.>*

Ontario Institute for Cancer Research

MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3

@OICR_news https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Foicr_news&data=04%7C01%7CMichelle.Xin%40oicr.on.ca%7C9fa8636ff38b4a60ff5a08d926dd2113%7C9df949f8a6eb419d9caa1f8c83db674f%7C0%7C0%7C637583553462287559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PS9KzggzFoecbbt%2BZQyhkWkQo9D0hHiiujsbP7Idv4s%3D&reserved=0 | www.oicr.on.ca

Collaborate. Translate. Change lives.

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

illtellyoulater commented 2 years ago

@lstein I haven't gotten any error message regarding "ulimit too low" or similar.

All the output I get from running the script is in my previous post.

EppuHeilimo commented 2 years ago

Same thing happened to me while using wsl2.
Fixed it by installing cuda drivers for wsl2
Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:

Write-Output "[wsl2]
memory=12GB" >> "${env:USERPROFILE}\.wslconfig"

wsl --shutdown

This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.

So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM.
Hope this helps.

RomanADavis commented 2 years ago

Installed conda and tried the fix here; doesn't seem to work. Is there any way to get better error messages out of the program?

Seems to still be a memory issue.

One tutorial said I could check what I did by using htop, but I don't really understand it.

RomanADavis commented 2 years ago

Uh, I'm a little confused here; Is it just that I don't have enough VRAM?

zorfmorf commented 2 years ago

@RomanADavis From the README.md:

the model is relatively lightweight and runs on a GPU with at least 10GB VRAM

RomanADavis commented 2 years ago

@zorfmorf Yeah, I ended up setting it up to run on CPU.

salamanders commented 2 years ago

Interesting enough, I'm also getting "Killed" - but maybe it has to do with not enough RAM (not VRAM?)

running on ubuntu-server
8GB RAM
Build cuda_11.5.r11.5/compiler.30672275_0
cuda-samples/Samples/1_Utilities/deviceQuery/deviceQuery:
- NVIDIA GeForce GTX 1080 Ti
- Total amount of global memory: 11178 MBytes which should be enough, right?

tlees commented 2 years ago

Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2]
memory=12GB" >> "${env:USERPROFILE}\.wslconfig"

wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.

So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.

In addition to doing this, I had to use this fork that allowed me to use under 8GB of VRAM.

https://github.com/basujindal/stable-diffusion

LC-Showroom commented 2 years ago

I'm also having this issue. I ran Stable Diffusion just fine on PopOS (Ubuntu) on the same machine. On WSL 2, however, I get the same result as the OP. Some information below on what I have:

$ ./deviceQuery ./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060" CUDA Driver Version / Runtime Version 11.7 / 11.7 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 12287 MBytes (12884246528 bytes) (028) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1837 MHz (1.84 GHz) Memory Clock rate: 7501 Mhz Memory Bus Width: 192-bit L2 Cache Size: 2359296 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1 Result = PASS

####################################################################################

$ ./deviceQueryDrv ./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060" CUDA Driver Version: 11.7 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 12287 MBytes (12884246528 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1837 MHz (1.84 GHz) Memory Clock rate: 7501 Mhz Memory Bus Width: 192-bit L2 Cache Size: 2359296 bytes Max Texture Dimension Sizes 1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > Result = PASS

####################################################################################

.wslconfig: [wsl2] memory=12GB

####################################################################################

Any help would be greatly appreciated. Thanks :)

salamanders commented 2 years ago

I switched to

a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui

and now everything works great.

LC-Showroom commented 2 years ago

I switched to

a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui

and now everything works great.

Thanks for the tip :)

h3x4g0ns commented 2 years ago

I switched to

a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui

and now everything works great.

If you don't mind me asking -- how much total RAM do you have installed now that everything is up and running?

ProkopHapala commented 1 year ago

I have this problem as well I realized the problem is due to lack of RAM memory. I have 16GB main RAM memory. It uses ~14.5GB. If I close browser it goes further, but eventually it still gets killed fter downloading some huge 3.9GB file.

Can somebody tell me how much main RAM memory it requires?

(ldm) prokop@DesktopGTX3060:~/git_SW/stablediffusion$ python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ~/stable_diffusion/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 512 --W 512
Global seed set to 42 Loading model from /home/prokop/stable_diffusion/768-v-ema.ckpt Global Step: 140000 No module 'xformers'. Proceeding without it. LatentDiffusion: Running in v-prediction mode DiffusionWrapper has 865.91 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.94G/3.94G [12:31<00:00, 5.25MB/s] Killed

stable_diff_2_memory-

ProkopHapala commented 1 year ago

isn't it possible to unload this huge parameter files from main RAM memory once they are loaded in GPU VRAM ?

ProkopHapala commented 1 year ago

I switched to

a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui

and now everything works great.

so stable-diffusion-webui use less main RAM memory ?

3DTOPO commented 1 year ago

This happened to me starting with a clean env and copy/pasting instructions which installed the CPU version of PyTorch for me! The fact it only took a minute to build xformers was suspect.

Anyways, makes sense as it is trying to run on the CPU and gets killed. I installed PyTorch with gpu support, re-installed some stuff (eg pip install .) and it works as expected.

WillyamBradberry commented 1 year ago

This happened to me starting with a clean env and copy/pasting instructions which installed the CPU version of PyTorch for me! The fact it only took a minute to build xformers was suspect.

Anyways, makes sense as it is trying to run on the CPU and gets killed. I installed PyTorch with gpu support, re-installed some stuff (eg pip install .) and it works as expected.

could you please make step-by-step guide for that?

Conaire commented 1 year ago

Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2]
memory=12GB" >> "${env:USERPROFILE}\.wslconfig"

wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run. So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.
In addition to doing this, I had to use this fork that allowed me to use under 8GB of VRAM.

https://github.com/basujindal/stable-diffusion

Thanks, worked!

kxwhiowo commented 3 months ago

Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2]
memory=12GB" >> "${env:USERPROFILE}\.wslconfig"

wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.

So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.

OMG U R A LEGEND!!!!!!!!!!

CompVis / stable-diffusion

txt2img.py always get killed #21