dvlab-research / Video-P2P

Video-P2P: Video Editing with Cross-attention Control
https://video-p2p.github.io/
355 stars 24 forks source link

The problem with rabbit-jump-p2p.yaml #9

Closed jiajiaxiaoskx closed 10 months ago

jiajiaxiaoskx commented 11 months ago

Hi, I have a problem with rabbit-jump-p2p.yaml. After I train the model using rabbit-jump-tune.yaml, the fine-tuned checkpoint is stored in the output folder, and when I use rabbit-jump-tune.yaml to edit the video, what the road should I use in pretrained_model_path config(line 1), since there are two folders (stable-diffusion-v1.5 and output) I have to load the model. Thanks for answering!

ShaoTengLiu commented 11 months ago

You need to use rabbit-jump-p2p.yaml to edit. Thanks.

jiajiaxiaoskx commented 11 months ago

Sorry for the mistake I made. I have another error for run_videop2p.py following your instruction

DDIM inversion... Null-text optimization... Start Video-P2P! 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:32<00:00, 1.84s/it] run_videop2p.py:652: RuntimeWarning: invalid value encountered in cast inversion.append( Image.fromarray((sequence1[i] 255).numpy().astype(np.uint8)) ) run_videop2p.py:653: RuntimeWarning: invalid value encountered in cast videop2p.append( Image.fromarray((sequence2[i] 255).numpy().astype(np.uint8)) )

How can I fix this problem? Thanks a lot!

ShaoTengLiu commented 11 months ago

Hi, in your logs, I only see some warnings. Can you clarify your problem and show me your running script?

jiajiaxiaoskx commented 11 months ago

It seems that the problem may lie in run_videop2p.py in the

with torch.no_grad(): --> sequence = ldm_stable( prompts, generator=generator, latents=x_t, uncond_embeddings_pre=uncond_embeddings, controller = controller, video_length=video_len, fast=fast, ).videos sequence1 = rearrange(sequence[0], "c t h w -> t h w c") sequence2 = rearrange(sequence[1], "c t h w -> t h w c") inversion = [] videop2p = []

the pixel values in sequence are all nan, making it impossible to generate the accurate gif file.

the config file I used is as follows:

pretrained_model_path: "./outputs/rabbit-jump" image_path: "./data/rabbit" prompt: "a rabbit is jumping on the grass" prompts:

Besides, when training, the step loss becomes nan after ~200 steps, and the output pixel values of validation_pipeline( line 332 in run_tuning.py) are also nan, I think there must be something wrong! Thanks a lot!

ShaoTengLiu commented 11 months ago

Dose python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast work for you?

ShaoTengLiu commented 10 months ago

I will temporally close this issue. You are welcome to reopen it if you still have this problem.