facebookresearch / StyleNeRF

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"
955 stars 91 forks source link

inversion/projection #7

Open lelechen63 opened 2 years ago

lelechen63 commented 2 years ago

Thanks for the great work!

I trained the model by myself (result image: https://drive.google.com/file/d/15PWiwtTUYJzB86CsxeO7g_0xLpopNmFc/view?usp=sharing). When I use your inversion.py/projector.py in apps to fit w, the w can not be converged. Video result of projector.py (https://drive.google.com/file/d/1vFLrITdVZ7PuOZDzysZmtxMhZdvT20m2/view?usp=sharing) and video result of inversion.py (https://drive.google.com/file/d/18WoIEIffdA9dG8sgxMGc3Hhkk0SBAoOD/view?usp=sharing). PS: I did not use an encoder network

Could you please take a look at it?

MultiPath commented 2 years ago

Yes, in our exploration we also noticed that the StyleNeRF checkpoint is much harder to invert than the vanilla 2D StyleGAN. Some examples works ok, some did not. I haven't dig deep enough to explain why it is the case. I also see it will get much better to train an encoder first to find the approximated camera and w. Welcome for more discussion

lelechen63 commented 2 years ago

Yes, in our exploration we also noticed that the StyleNeRF checkpoint is much harder to invert than the vanilla 2D StyleGAN. Some examples works ok, some did not. I haven't dig deep enough to explain why it is the case. I also see it will get much better to train an encoder first to find the approximated camera and w. Welcome for more discussion

Thanks for the quick reply. Are you using https://github.com/lelechen63/StyleNeRF/blob/main/apps/train_encoder.py to train the encoder, and use https://github.com/lelechen63/StyleNeRF/blob/main/apps/inversion.py to do the inversion? The inversion result you showed in the video looks great actually.

It would be tricky if the w can not be inverted. That means the output image and the latent space it not one-to-one mapping, and the stylenerf may not able to be directly used as a pretrained model for other tasks...

MultiPath commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase.

lelechen63 commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase. Yes, theoretically, we should be able to fit w/z from the image. I followed the train_encoder.py to train the encoder, and use inversion.py to test. The training result looks reasonable (https://drive.google.com/file/d/1pCUzDhRCNbKFzfu6RiHULHXtG1Q2U6yj/view?usp=sharing). But when I apply the encoder and nerft to inverison.py to fit w, those are what I get (https://drive.google.com/file/d/1IRbBMMk34OlLHLZIvuW5dADWHFcQIF8-/view?usp=sharing, https://drive.google.com/file/d/1vm_CPysKQ2T-N_RFLDf1NLaHiAPMUWHx/view?usp=sharing). Do we encounter the similar problem? Thanks!

Songlin1998 commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase. Yes, theoretically, we should be able to fit w/z from the image. I followed the train_encoder.py to train the encoder, and use inversion.py to test. The training result looks reasonable (https://drive.google.com/file/d/1pCUzDhRCNbKFzfu6RiHULHXtG1Q2U6yj/view?usp=sharing). But when I apply the encoder and nerft to inverison.py to fit w, those are what I get (https://drive.google.com/file/d/1IRbBMMk34OlLHLZIvuW5dADWHFcQIF8-/view?usp=sharing, https://drive.google.com/file/d/1vm_CPysKQ2T-N_RFLDf1NLaHiAPMUWHx/view?usp=sharing). Do we encounter the similar problem? Thanks!

Hi! Thank you for your great contributions!

I've been studying the inversion of StyleNeRF. Are the results in the paper based on inversion.py (or with the help of pre-trained encoder)? What shuould be cleaned up in the inversion.py?

xiangjun-xj commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase. Yes, theoretically, we should be able to fit w/z from the image. I followed the train_encoder.py to train the encoder, and use inversion.py to test. The training result looks reasonable (https://drive.google.com/file/d/1pCUzDhRCNbKFzfu6RiHULHXtG1Q2U6yj/view?usp=sharing). But when I apply the encoder and nerft to inverison.py to fit w, those are what I get (https://drive.google.com/file/d/1IRbBMMk34OlLHLZIvuW5dADWHFcQIF8-/view?usp=sharing, https://drive.google.com/file/d/1vm_CPysKQ2T-N_RFLDf1NLaHiAPMUWHx/view?usp=sharing). Do we encounter the similar problem? Thanks!

It seems like you have already successfully trained the encoder (while I am not as lucky and skilled as you). When i run train_encoder.py, i met some errors, one of them happened at "gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices)", written as "File "apps/train_encoder.py", line 188, in main gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices) File "", line 714, in get_final_output File "", line 707, in forward File "/nas/users/xiangjun/anaconda3/envs/py38-1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "", line 251, in forward File "/nas/users/xiangjun/git/StyleNeRF/./torch_utils/misc.py", line 85, in assert_shape if tensor.ndim != len(ref_shape): AttributeError: 'NoneType' object has no attribute 'ndim' " Did you meet similar errors before? If not, could you please offer some advise about how to run train_encoder.py (or train_encoder_resnet.py ) successfully? Great thanks!

lelechen63 commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase. Yes, theoretically, we should be able to fit w/z from the image. I followed the train_encoder.py to train the encoder, and use inversion.py to test. The training result looks reasonable (https://drive.google.com/file/d/1pCUzDhRCNbKFzfu6RiHULHXtG1Q2U6yj/view?usp=sharing). But when I apply the encoder and nerft to inverison.py to fit w, those are what I get (https://drive.google.com/file/d/1IRbBMMk34OlLHLZIvuW5dADWHFcQIF8-/view?usp=sharing, https://drive.google.com/file/d/1vm_CPysKQ2T-N_RFLDf1NLaHiAPMUWHx/view?usp=sharing). Do we encounter the similar problem? Thanks!

It seems like you have already successfully trained the encoder (while I am not as lucky and skilled as you). When i run train_encoder.py, i met some errors, one of them happened at "gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices)", written as "File "apps/train_encoder.py", line 188, in main gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices) File "", line 714, in get_final_output File "", line 707, in forward File "/nas/users/xiangjun/anaconda3/envs/py38-1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "", line 251, in forward File "/nas/users/xiangjun/git/StyleNeRF/./torch_utils/misc.py", line 85, in assert_shape if tensor.ndim != len(ref_shape): AttributeError: 'NoneType' object has no attribute 'ndim' " Did you meet similar errors before? If not, could you please offer some advise about how to run train_encoder.py (or train_encoder_resnet.py ) successfully? Great thanks!

You can try the code here: https://github.com/lelechen63/StyleNeRF/tree/main/apps

liu-yx17 commented 2 years ago

I mean we haven't looked into the inversion heavily. In theory it should work as we did not change anything for the style part compared to styleGAN2. I just found it was a bit more difficult, and not all images can be inverted perfectly. Yes, I used them to train, although the codes of that part may need some changes to adapt the current version of code. I made some reorg for the codebase. Yes, theoretically, we should be able to fit w/z from the image. I followed the train_encoder.py to train the encoder, and use inversion.py to test. The training result looks reasonable (https://drive.google.com/file/d/1pCUzDhRCNbKFzfu6RiHULHXtG1Q2U6yj/view?usp=sharing). But when I apply the encoder and nerft to inverison.py to fit w, those are what I get (https://drive.google.com/file/d/1IRbBMMk34OlLHLZIvuW5dADWHFcQIF8-/view?usp=sharing, https://drive.google.com/file/d/1vm_CPysKQ2T-N_RFLDf1NLaHiAPMUWHx/view?usp=sharing). Do we encounter the similar problem? Thanks!

It seems like you have already successfully trained the encoder (while I am not as lucky and skilled as you). When i run train_encoder.py, i met some errors, one of them happened at "gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices)", written as "File "apps/train_encoder.py", line 188, in main gen_img = G.get_final_output(z=None, c=None, styles=w_samples, camera_matrices=camera_matrices) File "", line 714, in get_final_output File "", line 707, in forward File "/nas/users/xiangjun/anaconda3/envs/py38-1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "", line 251, in forward File "/nas/users/xiangjun/git/StyleNeRF/./torch_utils/misc.py", line 85, in assert_shape if tensor.ndim != len(ref_shape): AttributeError: 'NoneType' object has no attribute 'ndim' " Did you meet similar errors before? If not, could you please offer some advise about how to run train_encoder.py (or train_encoder_resnet.py ) successfully? Great thanks!

You can try the code here: https://github.com/lelechen63/StyleNeRF/tree/main/apps

Hello, I use your codes but I stiil get the bugs as above. Could you tell me how to fix it in detail? Thank you!