generate video - Githubissues

cvlab-kaist / 3DGAN-Inversion

Official Implementation of WACV 2023 paper "3D GAN Inversion with Pose Optimization".

MIT License

110 stars 13 forks source link

generate video #3

Closed luchaoqi closed 1 year ago

luchaoqi commented 1 year ago

Hi, I would like to know how to generate a video given the output of PTI. I saw some pipelines here: https://github.com/NVlabs/eg3d/issues/66 and seems PTI outputs only the .pt network here and gen_videos.py require the pkl network format here I am wondering how did you save the network (.pkl) and use that in the gen_videos.py. Thanks!

mlnyang commented 1 year ago

We followed the PTI code to save our tuned generator params by torch.save(G.state_dict(), PATH), but I think we had a mistake in the code-single_id_coach.py line117 (saving generator checkpoint: save pickle(x) -> save state_dict(o)). Sorry for making you confused about the code. If you want to use the gen_videos.py directly, I recommend you save the network params by torch.save(G.state_dict(), PATH), and load it as torch.load_state_dict(PATH). Or you can generate the video right after the optimization is finished with this code.

luchaoqi commented 1 year ago

Thanks! I find that for demo images the results are pretty stable and good but not the case for custom image-in-the-wild. I tested with my own dataset following the EG3D preprocessing code but find the results are not even fixed.

Is there any way to fix the results like seed=0? Sorry for this basic question as I am new to the PTI area.

mlnyang commented 1 year ago

Can I take a look at your results?

luchaoqi commented 1 year ago

Can I take a look at your results?

Attached (especially image 3 and 5), I suspect it might have something to do with the PTI in hyperparameters.py? first run: results.zip
second run: results_2.zip

mlnyang commented 1 year ago

I think both cases were caused by incorrect camera viewpoint, not the generator hyperparameter at PTI. In some cases, the extrinsic collapses(Especially when the input image is blurry) and the network try to fit the image rendered from an incorrect viewpoint at PTI stage.

You can try regulating the camera lr hyperparameter, or setting visualize_opt_process=True in global_config.py to monitor the optimization process.

luchaoqi commented 1 year ago

I have a follow-up question regarding your implementation here: https://github.com/KU-CVLAB/3DGAN-Inversion/blob/3cfebf9abc0733aae5c5e512f33ce18d016e3e48/gen_videos.py#L84-L128

Seems that you directly feed the ws into G without the mapping network during inversion. I tried the same way but noticed that there are some artifacts like those shown here. This problem has also been discussed in the original EG3D repo here.

Did you incur similar problems and how did you solve this?

mlnyang commented 1 year ago

Our implementation directly optimizes ws, so we did not pay much attention on the mapping network. We tried to 'initialize' the ws by feeding gt cam of the input image to the mapping network, but I think the result was similar.