CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.18k stars 10.15k forks source link

It Rick Rolls you Sometimes #120

Open danzeeeman opened 2 years ago

danzeeeman commented 2 years ago

00040

atypicalconsortium commented 2 years ago

The SFW filter?

fat-tire commented 2 years ago

That's what it is.

JustinGuese commented 2 years ago

Did I Really Get Rick Rolled? Yes. If Stable Diffusion detects that a generated image may violate its safety filter, the generated image will be replaced with a still of Rick Astley.

https://www.assemblyai.com/blog/how-to-run-stable-diffusion-locally-to-generate-images/#:~:text=v1/model.ckpt-,Did%20I%20Really%20Get%20Rick%20Rolled%3F,-Yes.%20If%20Stable

If you want to remove the filter just google or directly edit the txt2img.py

danzeeeman commented 2 years ago

word...seems its a bit over zealous in its filtering....I was generating images of sneakers on a basketball court and half of them came back rick rolls.

JustinGuese commented 2 years ago

yes, and i would propose just leaving out the images that are rated nsfw or replace it with something like 🔞

feel free to use my txt2img.py and see, it saves VRAM which is of course the only reason I disabled the filter https://github.com/JustinGuese/stable-diffusor-docker-text2image/blob/master/txt2img.py

wget https://github.com/JustinGuese/stable-diffusor-docker-text2image/blob/master/txt2img.py scripts/ ^ remember never just download code blindly and execute it :p have a look at it, it basically uncomments the filter parts

ChristopherDrum commented 2 years ago

Yes, the filtering is a little aggressive. "a man showing his palm to the camera, a black crystal is embedded in the middle of the palm, hd photograph" came back with NSFW Rick Roll. "a man showing his" was the trigger, I guess?

CTimmerman commented 10 months ago

In my case it's because the check_safety function receives this as x_image parameter, resulting in a single-color image:

(ldm) D:\code\AI\stable-diffusion\stable-diffusion-main>python scripts/txt2img_good.py --prompt "a cat" --n_iter 2 --n_samples 1 --skip_grid --ckpt "models\ldm\stable-diffusion-v1\sd-v1-4.ckpt" --seed 777
Global seed set to 777
Loading model from models\ldm\stable-diffusion-v1\sd-v1-4.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                                                                                                           | 0/2 [00:00<?, ?it/s]Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0                                                                                                                                    | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:51<00:00,  7.04s/it]
Showing [[[[  nan 0.486   nan]
   [  nan 0.486   nan]██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [05:51<00:00,  6.97s/it]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]

  [[  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]

  [[  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]

  ...

  [[  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]

  [[  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]

  [[  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   ...
   [  nan 0.486   nan]
   [  nan 0.486   nan]
   [  nan 0.486   nan]]]] type <class 'numpy.ndarray'>
scripts/txt2img_good.py:92: RuntimeWarning: invalid value encountered in cast
  cv2.imshow('test', x_image.astype(np.uint8))
data:   0%|                                                                                                                                                                               | 0/1 [06:04<?, ?it/s]
Sampling:   0%|                                                                                                                                                                           | 0/2 [06:04<?, ?it/s]
Traceback (most recent call last):
  File "scripts/txt2img_good.py", line 382, in <module>
    main()
  File "scripts/txt2img_good.py", line 337, in main
    x_checked_image, has_nsfw_concept = check_safety(
  File "scripts/txt2img_good.py", line 92, in check_safety
    cv2.imshow('test', x_image.astype(np.uint8))
cv2.error: OpenCV(4.1.2) C:\projects\opencv-python\opencv\modules\core\src\array.cpp:2492: error: (-206:Bad flag (parameter or structure field)) Unrecognized or unsupported array type in function 'cvGetMat'

Same with --plms with the original code: scripts/txt2img.py:43: RuntimeWarning: invalid value encountered in cast