MountaintopLotus / braintrust

A Dockerized platform for running Stable Diffusion, on AWS (for now)
Apache License 2.0
1 stars 2 forks source link

Bad gateway 400 message #81

Open JohnTigue opened 1 year ago

JohnTigue commented 1 year ago

Scot just found a dead Invoke worker, the death type of which I've never seen before:

webui-docker-invoke-1  | 138.199.43.77 - - [27/Mar/2023 20:27:07] code 400, message Bad request version ('\\x9a\\x9a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93jj\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1  | 138.199.43.77 - - [27/Mar/2023 20:27:07] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03;}GÊÖ9%À\x18q\x8esù5Ê:?Z\x81Ü\x00ûëû\x9déï1Zït¶ ¤\x85iã½³à©\x86«=\x84\x88\x01/ºX\x9a\x04³\x05~ñ\x05K«\x9aòÑ>ä&\x00 \x9a\x9a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93jj\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
G
JohnTigue commented 1 year ago

Docker restart doesn't solve it (docker compose --profile invoke up). Let's try a new instance.

JohnTigue commented 1 year ago

Wait a sec, this (http://54.203.116.198:7860/) IS working?I'm seeing the UI in the browser…

JohnTigue commented 1 year ago

Now I'm seeing it stuck in a novel place:

webui-docker-invoke-1  | >> Initialization file /stable-diffusion/invokeai.init found. Loading...
webui-docker-invoke-1  | >> Internet connectivity is True
webui-docker-invoke-1  | >> InvokeAI, version 2.3.0+a0
webui-docker-invoke-1  | >> InvokeAI runtime directory is "/stable-diffusion"
webui-docker-invoke-1  | >> GFPGAN Initialized
webui-docker-invoke-1  | >> CodeFormer Initialized
webui-docker-invoke-1  | >> ESRGAN Initialized
webui-docker-invoke-1  | >> Using device_type cuda
webui-docker-invoke-1  | >> xformers memory-efficient attention is available and enabled
webui-docker-invoke-1  | >> Current VRAM usage:  0.00G
webui-docker-invoke-1  | >> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-invoke-1  | >> Scanning Model: stable-diffusion-1.5
webui-docker-invoke-1  | >> Model scanned ok!
webui-docker-invoke-1  | >> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
JohnTigue commented 1 year ago

Again, here's a weird, novel way it's getting stuck:

 >> Loading stable-diffusion-1.5 from /data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-invoke-1  |    | Forcing garbage collection prior to loading new model
webui-docker-invoke-1  |    | LatentDiffusion: Running in eps-prediction mode
webui-docker-invoke-1  |    | DiffusionWrapper has 859.52 M params.
webui-docker-invoke-1  |    | Making attention of type 'vanilla' with 512 in_channels
webui-docker-invoke-1  |    | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
webui-docker-invoke-1  |    | Making attention of type 'vanilla' with 512 in_channels

No idea yet.

JohnTigue commented 1 year ago

This cannot be solved with reboots on either the Docker or VM level. So, something is corrupted in the Docker image?

JohnTigue commented 1 year ago

No one else seems to be reporting this, not since October: https://github.com/invoke-ai/InvokeAI/pull/1253

JohnTigue commented 1 year ago

Well, at least now it's exiting but why? What is going on here? All my existing problem solvers is not working with this novel bug.

webui-docker-invoke-1 exited with code 0
JohnTigue commented 1 year ago

Well, at least it's stabilized in one bad state… I wonder what that gibblygook 400 message is trying to tell me…

(base) [root@ip-172-31-11-23 stable-diffusion-webui-docker]# docker compose --profile invoke up
[+] Running 1/0
 ⠿ Container webui-docker-invoke-1  Running                                                                                                                                               0.0s
Attaching to webui-docker-invoke-1

webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('rjhXÔY\\x88.A')
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03Y:Ó\x0b]\x88l\x96ð\x05Zr\x19=ëpߨçþB\x92®_\x96\\ \x9fJ\x10ß½ rjhXÔY\x88.A" 400 -
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x8a\\x8a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93ÊÊ\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03_¡G$\x00Êy\x7f;yu\x05Ñ!Bòø\x06\x97\x98(\x87%WÅí\x92\x15Õú\x0e$ h¯Nî¹ø¸\x1b\x11IÄ,\x9e\x11xÌ\x0fG\x059,\x1dí\x84\x14*¨óBk(ó\x00 \x8a\x8a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93ÊÊ\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x8a\\x8a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93**\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x039\x9e/\x97ç>è"´=\x1e1ù|îy¤d=\x15;¯\x14P\x93yX:\x91\x0c $ \x1d÷Ã\x06ç*\x12Ðøµ\x18½ê£\x19d×ú\x07MÕ\x96Æê+c7\x88(¦{Û\x00 \x8a\x8a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93**\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] code 400, message Bad request version ('\\x9a\\x9a\\x13\\x01\\x13\\x02\\x13\\x03À+À/À,À0̨̩À\\x13À\\x14\\x00\\x9c\\x00\\x9d\\x00/\\x005\\x01\\x00\\x01\\x93\\x8a\\x8a\\x00\\x00\\x00\\x17\\x00\\x00ÿ\\x01\\x00\\x01\\x00\\x00')
webui-docker-invoke-1  | 173.205.93.10 - - [27/Mar/2023 21:25:46] "\x16\x03\x01\x02\x00\x01\x00\x01ü\x03\x03iµ\x8b\x8cd\x7fG\x01\x9c&\x80ïï;\x89ÓñO\x1e;zø}<\x95}\x89\x82Ú\x8aSû ü0b\x89b\x13#záêÊyëq3h\x87Ù\x01f*#AÂf\x1cT\x9b\x12\x90wl\x00 \x9a\x9a\x13\x01\x13\x02\x13\x03À+À/À,À0̨̩À\x13À\x14\x00\x9c\x00\x9d\x00/\x005\x01\x00\x01\x93\x8a\x8a\x00\x00\x00\x17\x00\x00ÿ\x01\x00\x01\x00\x00" 400 -
JohnTigue commented 1 year ago

Hmm… maybe this isn't my stuff breaking but rather something upstream: https://github.com/AbdBarho/stable-diffusion-webui-docker/issues/381

JohnTigue commented 1 year ago

Retreated to two, non clustered instances:

JohnTigue commented 1 year ago

Whelp, Eleanore was able to break the independent server. The UI hangs with a %98 progress bar. The instance was hung, not dead Docker container. That's more bad news, but also maybe it's trying to tell me that I've been looking for a fix in the wrong place. image

JohnTigue commented 1 year ago

Another temp hack would be to put a startup script in the EC2 console…

JohnTigue commented 1 year ago

OK, this has been a real hassle over the last week or so… but maybe it is also the opening of an opportunity.

The codebase I started the cluster code from is now stale (note that its license is SD's CreativeML Open RAIL-M so it's pretty liberal for us to go off on). It hasn't been updated in over two months. So, we have to fork to our own repo to get around this bug. That's a hassle. But if that repo is stale… our hypnowerk could be a front-runner for the open source SD comminity… but that's another topic.