Open HugoVG opened 2 years ago
Insufficient vmem or possibly memory ulimit is set too low.
It's failing silently for me too:
(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion> python.exe .\scripts\dream.py
* Initializing, be patient...
* Initialization done! Awaiting your command (-h for help, q to quit)...
dream> a photograph of an astronaut riding a horse
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 440000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion>
Running it this way instead makes it talk a bit more:
(ldm) PS C:\Users\my-name\Downloads\generative-nn-models\stable-diffusion> python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 440000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
File "scripts/txt2img.py", line 279, in <module>
main()
File "scripts/txt2img.py", line 188, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "scripts/txt2img.py", line 31, in load_model_from_config
model = instantiate_from_config(config.model)
File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 78, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\models\diffusion\ddpm.py", line 461, in __init__
self.instantiate_cond_stage(cond_stage_config)
File "c:\users\ny-name\downloads\generative-nn-models\latent-diffusion\ldm\models\diffusion\ddpm.py", line 519, in instantiate_cond_stage
model = instantiate_from_config(config)
File "c:\users\my-name\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 78, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "c:\users\ny-namef\downloads\generative-nn-models\latent-diffusion\ldm\util.py", line 86, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
AttributeError: module 'ldm.modules.encoders.modules' has no attribute 'FrozenCLIPEmbedder'
(ldm) PS C:\Users\ny_name\Downloads\generative-nn-models\stable-diffusion>
memory ulimit is set to low
where do you set that?
Did you get a message that ulimit is set too low? As far as I'm aware, this command is a Linux thing. There's something equivalent called the Windows System Resource Manager, which might be what you're looking for: https://serverfault.com/questions/133122/ulimit-for-windows
An alternative is to try Basu Jindal's fork, in which he has aggressively optimized memory usage and claims that it can generate 512x512 images in under 4 GB of memory: https://github.com/basujindal/stable-diffusion
Please let me know how you manage to solve the problem.
On Mon, Aug 22, 2022 at 12:31 AM illtellyoulater @.***> wrote:
memory ulimit is set to low
where do you set that?
— Reply to this email directly, view it on GitHub https://github.com/CompVis/stable-diffusion/issues/21#issuecomment-1221801278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVM636OKX7H6ZNMIVN3V2L7AJANCNFSM56THTQ5Q . You are receiving this because you commented.Message ID: @.***>
--
Lincoln Stein
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
@.***
*E*xecutive Assistant
Michelle Xin
Tel: 647-260-7927
@. @.>*
Ontario Institute for Cancer Research
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
Collaborate. Translate. Change lives.
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
@lstein I haven't gotten any error message regarding "ulimit too low" or similar.
All the output I get from running the script is in my previous post.
Same thing happened to me while using wsl2.
Fixed it by installing cuda drivers for wsl2
Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2]
memory=12GB" >> "${env:USERPROFILE}\.wslconfig"
wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.
So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM.
Hope this helps.
Installed conda and tried the fix here; doesn't seem to work. Is there any way to get better error messages out of the program?
Seems to still be a memory issue.
One tutorial said I could check what I did by using htop, but I don't really understand it.
Uh, I'm a little confused here; Is it just that I don't have enough VRAM?
@RomanADavis From the README.md:
the model is relatively lightweight and runs on a GPU with at least 10GB VRAM
@zorfmorf Yeah, I ended up setting it up to run on CPU.
Interesting enough, I'm also getting "Killed" - but maybe it has to do with not enough RAM (not VRAM?)
Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2] memory=12GB" >> "${env:USERPROFILE}\.wslconfig" wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.
So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.
In addition to doing this, I had to use this fork that allowed me to use under 8GB of VRAM.
I'm also having this issue. I ran Stable Diffusion just fine on PopOS (Ubuntu) on the same machine. On WSL 2, however, I get the same result as the OP. Some information below on what I have:
$ ./deviceQuery ./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3060" CUDA Driver Version / Runtime Version 11.7 / 11.7 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 12287 MBytes (12884246528 bytes) (028) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1837 MHz (1.84 GHz) Memory Clock rate: 7501 Mhz Memory Bus Width: 192-bit L2 Cache Size: 2359296 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1 Result = PASS
####################################################################################
$ ./deviceQueryDrv ./deviceQueryDrv Starting...
CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3060" CUDA Driver Version: 11.7 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 12287 MBytes (12884246528 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1837 MHz (1.84 GHz) Memory Clock rate: 7501 Mhz Memory Bus Width: 192-bit L2 Cache Size: 2359296 bytes Max Texture Dimension Sizes 1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > Result = PASS
####################################################################################
.wslconfig: [wsl2] memory=12GB
####################################################################################
Any help would be greatly appreciated. Thanks :)
I switched to
a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui
and now everything works great.
I switched to
a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui
and now everything works great.
Thanks for the tip :)
I switched to
a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui
and now everything works great.
If you don't mind me asking -- how much total RAM do you have installed now that everything is up and running?
I have this problem as well I realized the problem is due to lack of RAM memory. I have 16GB main RAM memory. It uses ~14.5GB. If I close browser it goes further, but eventually it still gets killed fter downloading some huge 3.9GB file.
Can somebody tell me how much main RAM memory it requires?
(ldm) prokop@DesktopGTX3060:~/git_SW/stablediffusion$ python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ~/stable_diffusion/768-v-ema.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 512 --W 512
Global seed set to 42
Loading model from /home/prokop/stable_diffusion/768-v-ema.ckpt
Global Step: 140000
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.94G/3.94G [12:31<00:00, 5.25MB/s]
Killed
isn't it possible to unload this huge parameter files from main RAM memory once they are loaded in GPU VRAM ?
I switched to
a. more RAM b. https://github.com/AUTOMATIC1111/stable-diffusion-webui
and now everything works great.
so stable-diffusion-webui use less main RAM memory ?
This happened to me starting with a clean env and copy/pasting instructions which installed the CPU version of PyTorch for me! The fact it only took a minute to build xformers was suspect.
Anyways, makes sense as it is trying to run on the CPU and gets killed. I installed PyTorch with gpu support, re-installed some stuff (eg pip install .) and it works as expected.
This happened to me starting with a clean env and copy/pasting instructions which installed the CPU version of PyTorch for me! The fact it only took a minute to build xformers was suspect.
Anyways, makes sense as it is trying to run on the CPU and gets killed. I installed PyTorch with gpu support, re-installed some stuff (eg pip install .) and it works as expected.
could you please make step-by-step guide for that?
Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2] memory=12GB" >> "${env:USERPROFILE}\.wslconfig" wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run. So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.
In addition to doing this, I had to use this fork that allowed me to use under 8GB of VRAM.
Thanks, worked!
Same thing happened to me while using wsl2. Fixed it by installing cuda drivers for wsl2 Secondly ran out of RAM (not VRAM). By default windows gives wsl 50% of ram. This can be changed with powershell:
Write-Output "[wsl2] memory=12GB" >> "${env:USERPROFILE}\.wslconfig" wsl --shutdown
This writes a new file into your user folder's root .wslconfig and includes the required new memory limit. Change 12GB to whatever you are able to allocate. 12GB seemed to be enough. Remember that windows require 4Gb ram to run.
So in short, even without wsl2, you are most likely missing cuda drivers or running out of RAM or VRAM. Hope this helps.
OMG U R A LEGEND!!!!!!!!!!
killed comes from txt2img.py -> Line34:
full trace (with some debug point)
Anyone know why it keeps killing itself? i downloaded all models, setup Conda env etc