Open oneThousand1000 opened 2 years ago
Is it possible to share the code for the PTI inversion?
What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?
What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?
Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.
Ok, great!
What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?
Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.
IMG_0604.MOV
Hi,
I try to inverse this portrait, but is seems that it can't invert a correct eye glasses shape, this can also produce reseanable result in input view. I want to ask if it is because you get the eye glasses after the pivot optimization(which I can't achieve), then this shape will perserve on generator optimization? Or you add other regularization?
Thank your for your time!
@jiaxinxie97 Hi jiaxinxie97, I use the original eg3d checkpoints to generate video for the latent code, it seems that the eye glasses is reconstructed successfully, which indicates that I got the eye glasses before the pivot optimization.
I think you can check your projector code, I used this one (both w and w_plus are OK). The zip file I uploaded contains the input re-aligned image and the input camera parameters, you can check if they are consistent with yours.
video generated by the original eg3d checkpoints https://user-images.githubusercontent.com/32099648/174772757-d316bc1d-de52-49a4-a863-6de166000450.mp4
input re-aligned image and the input camera parameters: 01457.zip
Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?
Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?
Hi, named_buffers() is an attribute of the synthesis network in StyleGAN2, you can find it in the StyleGAN2Backbone (self.backbone) of the TriPlaneGenerator.
Try to use G.backbone.synthesis.named_buffers() instead of G.named_buffers(), and add the reg_loss.
G.backbone.synthesis.named_buffers() Hi, Thank you! I got reasonable result for the eye glasses.
Hi, @oneThousand1000
Did you set both the z
and c
as the trainable parameters during the GAN inversion? I guess fixing the c
(which can be obtained from the dataset.json) and only inverting the z
is more reasonable. What do you think?
Hi, @oneThousand1000
Did you set both the
z
andc
as the trainable parameters during the GAN inversion? I guess fixing thec
(which can be obtained from the dataset.json) and only inverting thez
is more reasonable. What do you think?
I set w or w_plus as trainable parameters and fix the c.
Got it, thanks for your reply.
BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?
Hi @oneThousand1000,
Do you have any out of domain results? I tried PTI by myself on FFHQ checkpoint, it works well on joker, but failed on celeba-HQ dataset.
follow the FFHQ preprocessing steps in EG3D
Got it, thanks for your reply.
BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?
I followed the FFHQ preprocessing steps in EG3D.
@mlnyang, I got the similar results as yours.
I use the well-aligned & cropped FFHQ images (in 1024 resolution), then I resize into 512 to do the subsequent PTI inversion. To be more specific, say, I choose the "00999.png" as the input. Since the camera parameters (25 = 4x4 + 3x3) are provided in dataset.json
, so I directly use it. The camara parameters are fixed while the w
latent code is trainable. The following are my results:
However, when I follow the FFHQ preprocessing steps in EG3D which basically contain (1) aligning & cropping in-the-wild image to 1500 size; (2) re-aligning to 1024 & center cropping to 700; (3) resizing to 512, the results seem good:
I guess the different underlying preprocessings might be the reason. When you tried PTI on joker, how did you preprocess?
Hi @bd20222 , Thanks for sharing your work.
I think that's the main reason. Actually, the joker is originally came from PTI repo, maybe it was already preprocessed joker image on FFHQ.
I took a look at the PTI aligning script, it seems the same as the original FFHQ.
I inspected the EG3D preprocessing and compared it with the original FFHQ. AFAIK, there is no center cropping step in the original FFHQ preprocessing, so you will find the faces used in EG3D show some vertical translation. I guess the well-trained EG3D model has captured this pattern, resulting in the blurry PTI inversion and the subsequent synthetic novel views seem to be a mixture of two faces.
It's interesting why the joker example works well.
Hi, please follow the "Preparing datasets" in reademe to get realigned images. According to https://github.com/NVlabs/eg3d/issues/16#issuecomment-1151563364, the original ffhq dataset is not work for the camera parameters of dataset.json, you should predict the camera parameters of original ffhq by yourself.
@oneThousand1000
Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself. But I haven't tested on the EG3D pre-trained model.
@oneThousand1000
Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself.
You can email the author and ask for the pose extraction code. Or refer to https://github.com/NVlabs/eg3d/issues/18
Oh I see.. center cropping was the problem. I just tried other examples in PTI and it didn't worked. It is strange why the joker image works well. Thanks for your help!! :)
@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?
@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?
See https://github.com/NVlabs/eg3d/issues/28#issuecomment-1161560077
Thanks
@oneThousand1000 I still have one question. Will be tuned for the parameters of triplane-decoder ? I used another 3D-GAN model (StyleSDF) which doesn't have the triplane generator, and I found the finetune of MLP parameters have harmed the geometry.
Hi, @oneThousand1000,
I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)". I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything? I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D. Thanks for your help.
Hi, @oneThousand1000,
I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)". I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything? I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D. Thanks for your help.
Hi, you need to feed the zs into mapping network of eg3d, get the w or ws latent code, then optimize the w or ws. Please refer to https://github.com/danielroich/PTI/tree/main/training/projectors or the paper of stylegan for the w/ws latent code definition.
I got it, thanks.
Thanks for your help, I also have got realistic results
@bd20222, hello! Which dataset.json do you use? I use ffhq-dataset-v2.json, but there's no camera parameters.
@lyx0208, hi, you have to preprocess the dataset in advance. The details can be found here. For your question, the camera parameters provided by the author can be downloaded from https://github.com/NVlabs/eg3d/blob/71ef469df0095c609b2b151127774ea74a1bf17c/dataset_preprocessing/ffhq/runme.py#L48-L50
@bd20222, get it, thanks!
FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/issues/18#issuecomment-1200366872
FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints. Hope that is useful. #18 (comment)
Hi! I found that all the faces in FFHQ Processed Data (download from the google drive link that you provided) are rotated so that two eyes are on a horizontal line. But the uploaded scripts seems to do no rotation. Does this matter?
The first image I uploaded is the image in FFHQ Processed Data, I processed the raw image of 00000 using your uploaded scripts and got the second image.
I also find that the uploaded scripts outputs camera parameters that are different from dataset.json. Maybe it is caused by the missing rotation?
The camera parameter that predicted by uploaded scripts: [ 0.944381833076477, -0.011193417012691498, 0.32866042852401733, -0.828210463398311, -0.010220649652183056, -0.9999367594718933, -0.004687247332185507, 0.005099154064645238, 0.32869213819503784, 0.0010674281511455774, -0.9444364905357361, 2.5698329570120664, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0 ] The camera parameter in dataset.json: [ 0.9422833919525146, 0.034289587289094925, 0.3330560326576233, -0.8367999667889383, 0.03984849900007248, -0.9991570711135864, -0.009871904738247395, 0.017018394869192363, 0.33243677020072937, 0.022573914378881454, -0.9428553581237793, 2.566997504832856, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0 ]
Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.
Maybe for in-the-wild images, the rotation is an unnecessary step.
I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.
Hi,
You are right that the alignment script from Deep3DFaceRecon does not do rotation alignment, but it should put the alignment in a ballpark and should be good enough for downstream tasks like GAN inversion etc.
Please note that for the FFHQ preprocessing, please do NOT use this script. Instead use the FFHQ preprocessing script for the proper reproduction. The role alignment will be taken care of as part of the FFHQ preprocessing script here. The rest of the processing should be the same as the new "in-the-wild script".
Hi,
You are right that the alignment script from Deep3DFaceRecon that does not do rotation alignment, but it should put the alignment in a ballpark and should be good enough for downstream tasks like GAN inversion etc.
Please note that for the FFHQ preprocessing, please do NOT use this script. Instead use the FFHQ preprocessing script for the proper reproduction. The role alignment will be taken care of as part of the FFHQ preprocessing script. The rest of the processing should be the same as the new "in-the-wild script".
Thanks for your guidance!
Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.
Maybe for in-the-wild images, the rotation is an unnecessary step.
I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.
Hi Yiqian, You list two repo for the alignment, I used the process image func in align_multiprocess.py https://github.com/NVlabs/eg3d/blob/67859eb31cb520f4e299cfd1c84d145332e3427c/dataset_preprocessing/ffhq/align_multiprocess.py#L47 to eliminate the effect of roll. Are these operations equivalent?
Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data. Maybe for in-the-wild images, the rotation is an unnecessary step. I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.
Hi Yiqian, You list two repo for the alignment, I used the process image func in align_multiprocess.py
to eliminate the effect of roll. Are these operations equivalent?
Hi jiaxin,
I think the image align code outputs the same results as the process image func in align_multiprocess.py.
Please see https://github.com/Puzer/stylegan-encoder/blob/1e7e47f9bbb0ca391cdc250af5ad2468250a803c/ffhq_dataset/face_alignment.py#L7, it seems that they are the same function.
@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.
@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.
Sure, I will add you to my private repo.
@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.
Sure, I will add you to my private repo.
Thanks.
@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.
Sure, I will add you to my private repo.
Hey oneThousand, would it be too cheeky to ask for an invite too? Would love to try it <3 -- btw awesome hifi3dface repo, going to try it out next week !
Hi guys, I released my EG3D inversion code for your reference, you can find it here: EG3D-projector.
@oneThousand1000 Hi, oneThousand1000. Can you invite me to your private repo? I tried to migrate PTI, But it's hard work to me T.T
@oneThousand1000 Hi, oneThousand1000. Can you invite me to your private repo? I tried to migrate PTI, But it's hard work to me T.T
See https://github.com/NVlabs/eg3d/issues/28#issuecomment-1207962383.
one Thousand1000...great job..but...it would be great if you could implement it in Colab Notebook. Indeed I think Nvidia or someone else should implement all this stuff in Colab, as there is many people which we do not have the knowledge to set up all this. Thanks for your work.
Hello, I would like to ask a question. I see that the paper mentions using PTI to implement image inverse to latent code, I now have a photo of a car and want to get its latent code corresponding to the eg3D network, but PTI does not seem to inverse for vehicles, or does not provide a pre-trained model on the cars dataset, so I would like to ask for advice on how to implement inverse for a vehicle image, thanks a lot!
I released my EG3D inversion code for your reference, you can find it here: EG3D-projector.
Thanks for the impressive work!
As you mentioned in the paper, you use Pivotal Tuning Inversion to invert test images. The PTI finetunes the EG3D parameters based on the pivot latent code, which is obtained by optimization. The pivot latent code is "w" or "w+", however, they are correlated with camera parameters that are fed to the mapping network. Will the novel-view synthesis be affected by this camera-fixed latent code?
I also noticed that you set a hyper-parameter
entangle = 'camera'
ingen_videos.py
, it seems that you have considered this issue when rendering different views for a specific latent code. I tried the 'condition' and 'both', the camera parameters that are input to the mapping only control some unrelated semantic attribute (expression, clothes...). I think maybe the[zs,c]
that is fed to the mapping network can be regarded as a latent code with a shape of [1,512+25], does thec
influence the camera view of subsequent synthesis?Now I have reproduced the PTI inversion of EG3D. Please see the video below. I input the re-aligned 00000.png and its camera parameters (from the dataset.json), then I optimize the latent code 'w' and use it as the pivot to finetune eg3d.
The result looks a little strange. I want to know if my implementation is consistent with yours!
I think I've figured out why the camera parameters that are input to the mapping network can't control the camera view, please refer to Generator Pose Conditioning.
https://user-images.githubusercontent.com/32099648/173868809-4cd3fc8b-774b-4068-865e-358f60e19411.mp4