[Bug]: Black video in txt2video

deroc1 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

Using txt2video will make you black video изображение_2023-07-12_215032240

Steps to reproduce the problem

Go to txt2video
Write prompt
Get no errors, but the black pictures and video

What should have happened?

Video with chip spinning

Version or Commit where the problem happens

1.4.0

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Windows

What device are you running WebUI on?

Nvidia GPUs (GTX 16 below)

Cross attention optimization

Doggettx

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

No

List of extensions

No

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.4.0
Commit hash: 394ffa7b0a7fff3ec484bcd084e673a8b301ccc8
Installing requirements

Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
2023-07-12 21:20:14,731 - ControlNet - INFO - ControlNet v1.1.232
ControlNet preprocessor location: D:\stabledif\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads
2023-07-12 21:20:14,868 - ControlNet - INFO - ControlNet v1.1.232
Loading weights [7eb674963a] from D:\stabledif\stable-diffusion-webui\models\Stable-diffusion\hassakuHentaiModel_v13.safetensors
*Deforum ControlNet support: enabled*
Creating model from config: D:\stabledif\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 10.3s (import torch: 2.8s, import gradio: 1.6s, import ldm: 0.7s, other imports: 1.2s, setup codeformer: 0.1s, load scripts: 2.0s, create ui: 1.4s, gradio launch: 0.4s).
preload_extensions_git_metadata for 10 extensions took 0.55s
DiffusionWrapper has 859.52 M params.
Applying attention optimization: Doggettx... done.
Textual inversion embeddings loaded(0):
Model loaded in 10.7s (load weights from disk: 1.6s, create model: 0.9s, apply weights to model: 5.6s, apply half(): 1.5s, move model to device: 1.0s, calculate empty prompt: 0.2s).
text2video — The model selected is: zeroscope_v2_576w (ModelScope-like)
 text2video extension for auto1111 webui
Git commit: 3f4a109a
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]Making a video with the following parameters:
{'prompt': 'chip spinning', 'n_prompt': 'text, watermark, copyright, blurry, nsfw', 'steps': 5, 'frames': 8, 'seed': 2305832716, 'scale': 17, 'width': 256, 'height': 256, 'eta': 0.0, 'cpu_vae': 'GPU (half precision)', 'device': device(type='cuda'), 'skip_steps': 0, 'strength': 1, 'is_vid2vid': 0, 'sampler': 'DDIM_Gaussian'}
Sampling random noise.
Sampling using DDIM_Gaussian for 5 steps.: 100%|█████████████████████████████████████████| 5/5 [01:47<00:00, 21.41s/it]
STARTING VAE ON GPU. 8 CHUNKS TO PROCESS.: 100%|█████████████████████████████████████████| 5/5 [01:47<00:00, 18.39s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([8, 3, 256, 256])
output/mp4s/20230712_212559624295.mp4
text2video finished, saving frames to D:\stabledif\stable-diffusion-webui\outputs/img2img-images\text2video\20230712212300
Got a request to stitch frames to video using FFmpeg.
Frames:
D:\stabledif\stable-diffusion-webui\outputs/img2img-images\text2video\20230712212300\%06d.png
To Video:
D:\stabledif\stable-diffusion-webui\outputs/img2img-images\text2video\20230712212300\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 1.36 seconds!
t2v complete, result saved at D:\stabledif\stable-diffusion-webui\outputs/img2img-images\text2video\20230712212300
Loading weights [f2769b3f82] from D:\stabledif\stable-diffusion-webui\models\Stable-diffusion\after_sex.safetensors
Creating model from config: D:\stabledif\stable-diffusion-webui\configs\v1-inference.yaml

Additional information

I have base UI settings. I think settings of video don`t matter, i tried to do detailed forest and chip spinning in 20 frames, both didnt work. Thanks for help

missionfloyd commented 1 year ago

You should post this on the txt2video repo. https://github.com/kabachuha/sd-webui-text2video