Closed ChrisAcrobat closed 2 years ago
📢 Discussion from #59 continues here.
@hlky @oc013 Looks like its crashing here. I don't see any logs from webui.py, are they hidden somewhere?
sd | entrypoint.sh: Launching...'
sd | Relaunch count: 9
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
sd | entrypoint.sh: Process is ending. Relaunching in 0.5s...
sd | /sd/entrypoint.sh: line 89: 774 Killed python -u scripts/webui.py
sd | entrypoint.sh: Launching...'
sd | Relaunch count: 10
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
sd | /sd/entrypoint.sh: line 89: 798 Killed python -u scripts/webui.py
sd | entrypoint.sh: Process is ending. Relaunching in 0.5s...
sd | entrypoint.sh: Launching...'
sd | Relaunch count: 11
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
sd | entrypoint.sh: Process is ending. Relaunching in 0.5s...
sd | /sd/entrypoint.sh: line 89: 822 Killed python -u scripts/webui.py
sd | entrypoint.sh: Launching...'
As far as I know what you see in STDOUT there is the logs. I'm rebuilding now to give you a comparison of what should happen on a successful first launch. Can you scroll back and see if there are any errors downloading things?
I'm assuming everything up to the actual launch point was successful since I see in your output that it got past loading the model files without an error:
sd | entrypoint.sh: Launching...
sd | Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /opt/conda/envs/ldm/lib/python3.8/site-packages/facexlib/weights/detection_Resnet50_Final.pth
sd |
100%|██████████| 104M/104M [00:05<00:00, 19.4MB/s]
sd | Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /opt/conda/envs/ldm/lib/python3.8/site-packages/facexlib/weights/parsing_parsenet.pth
sd |
100%|██████████| 81.4M/81.4M [00:04<00:00, 19.5MB/s]
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
sd | DiffusionWrapper has 859.52 M params.
sd | making attention of type 'vanilla' with 512 in_channels
sd | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
sd | making attention of type 'vanilla' with 512 in_channels
Downloading: 100%|██████████| 1.59G/1.59G [01:30<00:00, 18.9MB/s]
sd | Running on local URL: http://localhost:7860/
sd |
sd | To create a public link, set `share=True` in `launch()`.
PS That extra single quote I left in the launching msg really bugs me :laughing:
This is ready @ChrisAcrobat @oc013 ?
Haven't checked the last comments yet (timezone). I'll do it on the train or something. I'll reply later!
@oc013 I'm assuming everything up to the actual launch point was successful since I see in your output that it got past loading the model files without an error:
I think it looks so too. 🙂 log
This PR changes the scripts/webui.py @hlky
$ git stash
warning: CRLF will be replaced by LF in scripts/webui.py.
Yeah, that wasn't ment to be merged. Did I accidentally disable the Draft-status?
@hlky: Remove this line: https://github.com/hlky/stable-diffusion/blob/main/.gitattributes#L2
I think either remove .gitattributes
and users of the repository should be expected to have their git client setup as I linked in the previous discussion, so they get what is exactly in the repository
git config --global core.autocrlf input
Or the files in the repository should be updated to be consistent to allow .gitattributes
to do its thing and result in a clean state on everyone's local machine.
Doing git config --global core.autocrlf input
will effect every (future?) user repo, in which can cause other problems.
But sure, adding .gitattributes
to .gitignore
could be a possible fix, but the .sh
files (as in my intend PR 5007fdc96f7b6bdb33a56b0a03f0765127e8e585) should never ever use CRLF
, right?
Someone made a PR that will make everything consistent #91
Regarding your problem I'm not sure what's going wrong yet but I did start a general discussion here #93 on docker that includes some Windows specific info I found. Can you verify that you've completed everything there?
Yes, I have just redid all the steps in #93 from scratch with no noticeable difference. I after the reinstall also tried closing the container and then restart it, and then I noticed where it crashed. It was maybe obvious for you, but I now see that it crashes somewhere here:
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
Full log after the second docker compose up
, after the first attempt returned the same result as previous (similar or equal to this):
[+] Running 1/0
- Container sd Created 0.0s
Attaching to sd
sd | active environment : ldm
sd | active env location : /opt/conda/envs/ldm
sd | Validating model files...
sd | checking model.ckpt...
sd | model.ckpt is valid!
sd |
sd | checking GFPGANv1.3.pth...
sd | GFPGANv1.3.pth is valid!
sd |
sd | checking RealESRGAN_x4plus.pth...
sd | RealESRGAN_x4plus.pth is valid!
sd |
sd | checking RealESRGAN_x4plus_anime_6B.pth...
sd | RealESRGAN_x4plus_anime_6B.pth is valid!
sd |
sd | entrypoint.sh: Launching...'
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | LatentDiffusion: Running in eps-prediction mode
sd | entrypoint.sh: Process is ending. Relaunching in 0.5s...
sd | /sd/entrypoint.sh: line 89: 29 Killed python -u scripts/webui.py
sd | entrypoint.sh: Launching...'
sd | Relaunch count: 1
Can you try the following to see if you get any more info:
docker exec -it sd bash
python -u scripts/webui.py
(ldm) root@519ae7e8a662:/sd# python -u scripts/webui.py
Loaded GFPGAN
Loaded RealESRGAN with model RealESRGAN_x4plus
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
Killed
Ok just making sure it wasn't somehow the bash script killing the python script.
The next message should be DiffusionWrapper has 859.52 M params.
I'm not 100% sure at this point but it's possibly memory that's an issue? What's your hardware look like? In the docker exec -it sd bash
can you run nvidia-smi
and see your gpus there?
https://stackoverflow.com/questions/65935028/python-script-gets-killed https://stackoverflow.com/questions/19189522/what-does-killed-mean-when-processing-a-huge-csv-with-python-which-suddenly-s
nvidia-smi
returns:
Mon Aug 29 13:24:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 516.59 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 59C P0 31W / N/A | 2200MiB / 8192MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 84 C /python3.8 N/A |
+-----------------------------------------------------------------------------+
@oc013 It now also displayed python3.8
, which it didn't do before. I was probably too quick the last time. So I also tried python -u scripts/webui.py
again and now got this;
Traceback (most recent call last):
File "/sd/scripts/webui.py", line 3, in <module>
from frontend.frontend import draw_gradio_ui
ModuleNotFoundError: No module named 'frontend'
Added #93 to the wiki https://github.com/hlky/stable-diffusion/wiki/Docker-Guide
Updating shouldn't be the issue, because I made a clean install with first purging Docker. But I will certainly try it again!
This is new front end code they changed, maybe cloned at a bad time. You can verify if it's there or not at frontend/frontend.py
There was a frontend/frontend.py
in the container, but I didn't look that closely. I have purge it now and is reinstalling.
I tried now again, but no luck. Still the same.
It crashes somewhere after the log sd | LatentDiffusion: Running in eps-prediction mode
and sd | DiffusionWrapper has 859.52 M params.
.
This maybe doesn't mean much, but cmdr2/stable-diffusion-ui is working fine for me through Docker.
I'm opening a issue, it doesn't make since to have the communication in a closed PR. 🙂
By the way, thank you very much to both of you, @oc013 and @hlky for your help so far!
DRAFT: Confirming the solution.