Open JohnTigue opened 1 year ago
Docker restart doesn't solve it (docker compose --profile invoke up
). Let's try a new instance.
Wait a sec, this (http://54.203.116.198:7860/) IS working?I'm seeing the UI in the browser…
Now I'm seeing it stuck in a novel place:
webui-docker-invoke-1 | >> Initialization file /stable-diffusion/invokeai.init found. Loading...
webui-docker-invoke-1 | >> Internet connectivity is True
webui-docker-invoke-1 | >> InvokeAI, version 2.3.0+a0
webui-docker-invoke-1 | >> InvokeAI runtime directory is "/stable-diffusion"
webui-docker-invoke-1 | >> GFPGAN Initialized
webui-docker-invoke-1 | >> CodeFormer Initialized
webui-docker-invoke-1 | >> ESRGAN Initialized
webui-docker-invoke-1 | >> Using device_type cuda
webui-docker-invoke-1 | >> xformers memory-efficient attention is available and enabled
webui-docker-invoke-1 | >> Current VRAM usage: 0.00G
webui-docker-invoke-1 | >> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-invoke-1 | >> Scanning Model: stable-diffusion-1.5
webui-docker-invoke-1 | >> Model scanned ok!
webui-docker-invoke-1 | >> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
Again, here's a weird, novel way it's getting stuck:
>> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-invoke-1 | | Forcing garbage collection prior to loading new model
webui-docker-invoke-1 | | LatentDiffusion: Running in eps-prediction mode
webui-docker-invoke-1 | | DiffusionWrapper has 859.52 M params.
webui-docker-invoke-1 | | Making attention of type 'vanilla' with 512 in_channels
webui-docker-invoke-1 | | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
webui-docker-invoke-1 | | Making attention of type 'vanilla' with 512 in_channels
No idea yet.
This cannot be solved with reboots on either the Docker or VM level. So, something is corrupted in the Docker image?
No one else seems to be reporting this, not since October: https://github.com/invoke-ai/InvokeAI/pull/1253
Well, at least now it's exiting but why? What is going on here? All my existing problem solvers is not working with this novel bug.
webui-docker-invoke-1 exited with code 0
Well, at least it's stabilized in one bad state… I wonder what that gibblygook 400 message is trying to tell me…
(base) [root@ip-172-31-11-23 stable-diffusion-webui-docker]# docker compose --profile invoke up
[+] Running 1/0
⠿ Container webui-docker-invoke-1 Running 0.0s
Attaching to webui-docker-invoke-1
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('rjhXÔY\\x88.A')
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03Y:Ó\x0b]\x88l\x96ð\x05Zr\x19=ëpߨçþB\x92®_\x96\\ \x9fJ\x10ß½ rjhXÔY\x88.A" 400 -
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x8a\\x8a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93ÊÊ\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03_¡G$\x00Êy\x7f;yu\x05Ñ!Bòø\x06\x97\x98(\x87%WÅí\x92\x15Õú\x0e$ h¯Nî¹ø¸\x1b\x11IÄ,\x9e\x11xÌ\x0fG\x059,\x1dí\x84\x14*¨óBk(ó\x00 \x8a\x8a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93ÊÊ\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x8a\\x8a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93**\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x039\x9e/\x97ç>è"´=\x1e1ù|îy¤d=\x15;¯\x14P\x93yX:\x91\x0c $ \x1d÷Ã\x06ç*\x12Ðøµ\x18½ê£\x19d×ú\x07MÕ\x96Æê+c7\x88(¦{Û\x00 \x8a\x8a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93**\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x9a\\x9a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93\\x8a\\x8a\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1 | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03iµ\x8b\x8cd\x7fG\x01\x9c&\x80ïï;\x89ÓñO\x1e;zø}<\x95}\x89\x82Ú\x8aSû ü0b\x89b\x13#záêÊyëq3h\x87Ù\x01f*#AÂf\x1cT\x9b\x12\x90wl\x00 \x9a\x9a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93\x8a\x8a\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
Hmm… maybe this isn't my stuff breaking but rather something upstream: https://github.com/AbdBarho/stable-diffusion-webui-docker/issues/381
Retreated to two, non clustered instances:
Whelp, Eleanore was able to break the independent server. The UI hangs with a %98 progress bar. The instance was hung, not dead Docker container. That's more bad news, but also maybe it's trying to tell me that I've been looking for a fix in the wrong place.
Another temp hack would be to put a startup script in the EC2 console…
OK, this has been a real hassle over the last week or so… but maybe it is also the opening of an opportunity.
The codebase I started the cluster code from is now stale (note that its license is SD's CreativeML Open RAIL-M so it's pretty liberal for us to go off on). It hasn't been updated in over two months. So, we have to fork to our own repo to get around this bug. That's a hassle. But if that repo is stale… our hypnowerk could be a front-runner for the open source SD comminity… but that's another topic.
Scot just found a dead Invoke worker, the death type of which I've never seen before: