[Bug]: Unable to generate coherent images after Transformers 4.25.1 requierment change.

BlackWyvern commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

Formatted my system. Encountered https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7306.

Did a full re-clone of the repository once cause discovered. Decided to not change that setting. Ever.

Got the client working. Now unable to generate coherent images via any prompt/model.

All modules successfully installed, Attempted --reinstall-torch and --reinstall-xformers. Problem persists.

Deleted pip cache and reinstalled torch and xformers Problem persists.

Dove into the deep requirements, thinking it could perhaps be a CUDA issue. (Not being utilized) Found the following: python -c "import torch; print(torch.cuda.is_available())" Returns False

Installed CUDA 12 Rebooted Still False Realized I'm dumb and installed the wrong version of CUDA. Removed CUDA 12

Clean-Installed CUDA 11.7 Rebooted True

Generated example images,

Deleted and re-cloned repository Problem persists.

Deleted pip cache and reinstalled torch and xformers Problem persists.

Uninstalled CUDA Rebooted

Deleted pip cache Deleted and re-cloned repository Problem persists.

Generating new images with the exact same prompt and properties now seems to make different results than previous example. Example

00000-3881076342

Steps to reproduce the problem

Format windows
Clone repository
Open A1x4
Successfully fail

What should have happened?

Fail to fail

Commit where the problem happens

1d246652

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

None

List of extensions

None

Console logs

Already up to date.
venv "I:\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 0a8515085ef258d4b76fdc000f7ed9d55751d6b8
Installing requirements for Web UI
Launching Web UI with arguments: --xformers --medvram --autolaunch --no-half-vae
Calculating sha256 for I:\stable-diffusion-webui\models\Stable-diffusion\AnythingProtoX5330.safetensors: d8d4c629724cca3df0eda092c2c51147a7ba74dda07ce301ff09adcd3be8d069
Loading weights [d8d4c62972] from I:\stable-diffusion-webui\models\Stable-diffusion\AnythingProtoX5330.safetensors
Creating model from config: I:\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(4): bad-artist-anime, bad-artist, bad_prompt_version2, CharTurner
Model loaded in 14.6s (calculate hash: 12.6s, create model: 0.5s, apply weights to model: 0.6s, apply half(): 0.7s).
Checkpoint AnythingProtoX5330.safetensors [d8d4c62972] not found; loading fallback AnythingProtoX5330.safetensors [d8d4c62972]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Calculating sha256 for I:\stable-diffusion-webui\models\Stable-diffusion\Wyvern Mix.safetensors: f6647285ed26fcea84312e3a0732ed60672f527e2d2d946821fc733467c925c3
Loading weights [f6647285ed] from I:\stable-diffusion-webui\models\Stable-diffusion\Wyvern Mix.safetensors
Applying cross attention optimization (Doggettx).
Weights loaded in 24.2s (calculate hash: 23.1s, apply weights to model: 0.3s, move model to device: 0.7s).
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00,  6.46it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████| 30/30 [00:03<00:00,  8.30it/s]

Additional information

Running a 100% fresh install of Windows 10 and a 3080.

BlackWyvern commented 1 year ago

Realize console log was from the series of first attempts. Had a notepad up with different logs as I've been dealing with this for several hours now. Current latest log is posted as follows: (Model was changed to WyvernMix between fails)

venv "I:\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 1d24665229bcef3de1e517f0db3ce296749d8a98
Installing requirements for Web UI
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [f6647285ed] from I:\stable-diffusion-webui\models\Stable-diffusion\Wyvern Mix.safetensors
Creating model from config: I:\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.4s (create model: 0.5s, apply weights to model: 0.5s, apply half(): 0.7s, move model to device: 0.6s, load textual inversion embeddings: 1.0s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Karuyo commented 1 year ago

I updated my stable diffusion yesterday and have similar issue. Generated images are worst than before the update. It could be related to this bug : https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7244

EllangoK commented 1 year ago

No module 'xformers'. Proceeding without it.

@BlackWyvern The issue could be xformers. xformers affects how the images are generated, so if you had xformers before to reproduce it you need to reinstall xformers.

BlackWyvern commented 1 year ago

Perhaps I wasn't clear with my diagnostic process. Granted, the final log shows skipped xformers, true. Oversight.

I've (re)installed xformers multiple times throughout testing. Before and after the example images given. This is not an xformers issue. There would at least be a coherent image being generated if it were.

Edit: Tested process with https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7244. Per-frame generation previews begin building the second example image. It makes no attempt to build coherency.

Additional edit: However, I did notice the model hash DID infact change. So I think it possibly related.

AlUlkesh commented 1 year ago

A few more things you can check: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/7340#discussioncomment-4808932

BlackWyvern commented 1 year ago

I don't use Karras samplers usually, but ended up with another space scene. It seems to enjoy blueprints and space. 00004-3881076342 High-rez also doesn't do anything for it.

AlUlkesh commented 1 year ago

Ok, that is very odd. With the pnginfo of this image, I get this: 14627-3881076342- nd , best quality masterpiece, uploaded to e621, safe content, anatomically correct, sfw, solo_(female (anthro dragon adult), r

Is your model or the hypernetwork perhaps corrupt or not accessible?

BlackWyvern commented 1 year ago

That's quite good. But, no. I just downloaded fresh copies of the models I used to see if it could possibly be something in the way I converted them to safetensors. But the raw ckpt files themselves are also completely incoherent. This time seemingly wanting to favor abstract horror with the again listed exact same prompt. Tested with no hypernets and embeddings to be absolutely certain.

Nothing about them has changed to my knowledge. Given it seems to affect all the models I have, I think it's more of an endemic issue. I even tried rebuilding the model I use personally,

Edit: It is most assuredly something that was done in the last week. I rolled my install back to 8b903322e66a694245970832f7fa89e2faa2ad0a (picked at random) and it is now generating properly.

Additional Edit: Calling git pull from above hash, updating to latest version again breaks generation. Something somewhere is broken.

More edits: cc8c9b7474d917888a0bd069fcd59a458c67ae4b spits out errors like mad. ce72af87d3b05c946bc82033786fc340f1c20512 exhibits the ongoing issues.

00038-3881076343- nd , best quality masterpiece, uploaded to e621, safe content, anatomically correct, sfw, solo_(female (anthro dragon adult), r

AlexYez commented 1 year ago

That's quite good. But, no. I just downloaded fresh copies of the models I used to see if it could possibly be something in the way I converted them to safetensors. But the raw ckpt files themselves are also completely incoherent. This time seemingly wanting to favor abstract horror with the again listed exact same prompt. Tested with no hypernets and embeddings to be absolutely certain.

Nothing about them has changed to my knowledge. Given it seems to affect all the models I have, I think it's more of an endemic issue. I even tried rebuilding the model I use personally,

Edit: It is most assuredly something that was done in the last week. I rolled my install back to 8b90332 (picked at random) and it is now generating properly.

Additional Edit: Calling git pull from above hash, updating to latest version again breaks generation. Something somewhere is broken.

More edits: cc8c9b7 spits out errors like mad. ce72af8 exhibits the ongoing issues.

Yes, I have the same issue(

BlackWyvern commented 1 year ago

Started going through version commits to see where the error occurs. ea9bd9fc7409109adcd61b897abc2c8881161256 (Latest) Still exhibiting error 7ba7f4ed6e980051c9c461f514d2ddee43001b7e No fails e8c3d03f7d9966b81458944efb25666b2143153f Error caught was: No module named 'triton' No fails d63340a4851ce95c9a3a9fffd9cf27643e2ae1b3 ModuleNotFoundError: No module named 'modules.extras' Fail to launch. 35419b274614984e2b511a6ad34f37e41481c809 Triton error, no fails. e33cace2c2074ef342d027c1f31ffc4b3c3e877e Triton error (I really don't know what this is, or why it's complaining about it.) No fails 194cbd065e4644e986889b78a5a949e075b610e8 Triton error No fails 8b903322e66a694245970832f7fa89e2faa2ad0a No triton error No fails 10421f93c3f7f7ce88cb40391b46d4e6664eff74 Exhibits error

Refining timestamps... Jan 25th 4d634dc592ffdbd4ebb2f1acfb9a63f5e26e4deb Fail Jan 24th e3b53fd295aca784253dfc8668ec87b537a72f43 Success

Time Between 789d47f832a5c921dbbdd0a657dff9bca7f78d94 Fail 1574e967297586d013e4cfbb6628eae595c9fba2 Fail 11485659dca08ef967f3e5462382f91504195ef0 Fail bd9b55ee908c43fb1b654b3a3a1320545023ce1c Fail ee0a0da3244123cb6d2ba4097a54a1e9caccb687 Success

I can absolutely confirm that the error began, and continues to exist as of commit bd9b55ee908c43fb1b654b3a3a1320545023ce1c.

Something about the updated transformers requirement breaks my ability to generate images. That's the only line of code changed between that and ee0a0da3244123cb6d2ba4097a54a1e9caccb687.

For anyone in the same boat, back up your models and whatnot, then open a bash/cmd window in your root folder and run git reset --hard ee0a0da3244123cb6d2ba4097a54a1e9caccb687

Edit: Additional troubleshooting Updated to Python .10.7 Deleted and rebuilt /venv Reinstalled torch and xformers Did not profit.

BlackWyvern commented 1 year ago

Tried updating to latest commit while keeping requirements.txt from ee0a0da. (So it doesn't update torch and xformers) It's making .... more coherent images, but it would appear whatever makes models... actually do model weights... Up and f'd off.

So I rolled back again. Just can't figure this one out. 00047-3214263335

zrichz commented 1 year ago

fyi, "Triton" is this: https://openai.com/blog/triton/ A lot of the Facebook xformers examples were built using this, so (although you note xformers is working), that's where the "error" is coming from. - However I think it's more of a warning, and I think Auto1111 has suppressed this in later commits

BlackWyvern commented 1 year ago

Understood. So unrelated to the issue, most likely. That being said, I've still as yet been unable to get this one solved.

While I am using the --xformers argument I do not know of a way to test if they're actually working. I get the same image results with and without them enabled with the same image parameters. Also of note, time of generation does not change. I /think/ I remember reading somewhere that xformers are supposed to add some kind of non-deterministic generation issues in exchange for speed, so I don't know. If there's a way to test that, I'd love to know.

As for additional steps I've taken since then.. Uninstalled Python Manually cleaned every directory for Python, Pip, Git, ect Clean reinstalled GPU drivers Clean installed CUDA 11.7 Toolkit (nvidia-smi reports CUDA 12, nvcc reports 11.7) Reinstalled Python 3.10.6 Re-cloned the repo at latest buikd Redownloaded and rebuilt my models and I'm still having the issue.

So I'm effectively still stuck on https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/ee0a0da3244123cb6d2ba4097a54a1e9caccb687.

AlUlkesh commented 1 year ago

I /think/ I remember reading somewhere that xformers are supposed to add some kind of non-deterministic generation issues in exchange for speed

Yes, it does.

You can check the console log for xformers after loading a model:

Loading weights [13dfc9921f] from C:\tools\Ai\stable-diffusion-webui\models\Stable-diffusion\dreamshaper_332BakedVaeClipFix.safetensors
Loading VAE weights specified in settings: C:\tools\Ai\stable-diffusion-webui\models\Stable-diffusion\vae-ft-mse-840000-ema-pruned.vae.pt
Applying xformers cross attention optimization.

BlackWyvern commented 1 year ago

Then yes, they should be getting applied. Do notice the speed up over Doggettx.

Singles: 5.26 -> 4.52s 3x3: 37.44 -> 30.47s

No change to the output generation, so I guess the non-deterministic thing isn't a thing. Still doesn't explain why I can't use the next commit up though..

git checkout https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/bd9b55ee908c43fb1b654b3a3a1320545023ce1c and git pull both result in loss of generation and will output nearly the same after image.

Before/After 00168-1786184156 00189-1786184156

I went back through and checked the clip skip and live previews to check again for possible parallels with 7244 and 7340 again. Nothin.

BlackWyvern commented 1 year ago

I was able to update to the latest commit and generate coherent images, but not attain parity with the previous version. I suspect there's too much under the hood changed.

Once I pulled to latest, I reset the requirements.txt and requirements_version.txt to transformers==4.19.2 and it was able to generate.

I still cannot determine why requesting the updated transformers requirement breaks... practically everything.. But it does. I don't know what all rolling back the requirement will or might break, and to what extent, so any more assistance into figuring this one out would be great.

AUTOMATIC1111 / stable-diffusion-webui