Closed flyingshan closed 1 year ago
The first and second one seem as if they are issues with the guidance scales. But I am slightly surprised the results are not so good. I will try to run a few of these myself and see what the difference is. Are you using the standard in2n model?
Yes, the training command is as follows:
ns-train in2n --data /workspace/face --load-dir /workspace/outputs/face/nerfacto/2023-04-17_090548/nerfstudio_models --pipeline.prompt "as a bronze bust" --pipeline.guidance-scale 7.5 --pipeline.image-guidance-scale 1.5 --pipeline.ip2p-device cuda:1 --viewer.websocket-port 7008
Ok thank you! There might be some issue in the code that I've released, I'm gonna take a closer look at it tonight to see what's wrong. Thank you for these results!
Thank you, and I found for some scenes it looks like it is as good as the results in the paper, for example:
https://user-images.githubusercontent.com/42313652/233240681-0b71260f-9ec6-4e74-8605-b2fe0d4c0299.mp4 “Put him in a suit”
I see, that looks pretty good. By the way, I provided the camera paths in those data folders, so you can render the paths that we have on the website. When I was testing the code I was mostly trying it on the farm scene, and it looked good on that end.
Also, may I ask what resolution you are training the face scene at? Are you using the downscale factor of 2?
I just trained it with
ns-train nerfacto --data face
so I guess the input images were not downscaled?
Oh i see, can you re-train both of these at a lower resolution?
Add the following to your nerfacto, then have it for ur in2n and also load the new nerfacto weights.
nerfstudio-data --downscale-factor 2
The only reason I suggest you do that is because the person scene is already downscaled to 512 and the training went quite well for you. Let me know if that changes anything on the face!
ok, I' ll try it!
I'm trying the cowboy hat with this codebase right now, and it definitely doesn't look as bad as your result. Can you try it with guidance scale of 6.5 and image guidance scale of 1.3? These were the parameters used in the paper.
However, there still is a difference between the result I am getting now and the result we have in the paper. I will take a closer look as to what the issue is.
Do you have the latest version of the code pulled?
Ok so I ran the cowboy both on this repo and my research repo. These are rendered at around 40k total iterations (30k from nerfacto, 10k from in2n). The files are labeled "new" and "old" respectively. The results look pretty similar in both cases, and I'd say both look pretty similar to what is in the paper.
Here's the exact command that I ran: ns-train in2n --data data/nerfstudio/frederik/ --load-dir outputs/frederik/nerfacto/2023-04-05_193848/nerfstudio_models --pipeline.prompt "give him a cowboy hat" --pipeline.guidance-scale 6.5 --pipeline.image-guidance-scale 1.3 --pipeline.model.near-plane 0.2 nerfstudio-data --scene-scale 2.0 --downscale-factor 2
I guess the few parameters at the end to adjust scene scales are making a difference? I also could image that InstructPix2Pix does not perform well on full-resolution images. Also, maybe your results are over-optimized? Not sure what the delta is here.
I pushed a minor change (change setting on Ip2p to improve performance slightly) which may be the cause for the difference? Go ahead and pull the most recent code and let me know what happens.
Thank you for your effort! I have run a experiment with the downscale factor setting as 2, and it looks much better than the full resolution result, I will try setting other parameters as you say and using the lateset repo, I'll report the result when I get it.
I see, then I suppose it is because InstructPix2Pix doesn't do well on high-resolution images. I'll make a more clear note of that in the readme.
Also, what are your GPU specs? I also forgot to put a note for this, but the model doesn't need to train for more than 10k iters in most cases, so you could render the model at around 40k iters if you would like. Let me know what you find
Here's what the bronze looks like, I think it looks good. I'll put the command after it.
ns-train in2n --data data/nerfstudio/frederik/ --load-dir outputs/frederik/nerfacto/2023-04-05_193848/nerfstudio_models/ --pipeline.prompt "as a bronze statue" --pipeline.guidance-scale 7.5 --pipeline.image-guidance-scale 1.3 --pipeline.model.near-plane 0.2 nerfstudio-data --scene-scale 2.0 --downscale-factor 2
This definitely leads me to believe that the issue is the high-resolution images used with InstructPix2Pix is a problem. Let me know what your results look like.
I have tried using your commands to train these two prompts with the latest repo, the results look very similar to your results:
Btw, I notice that here from readme gives tips on how to set the guidance-scale related hyper-parameters, but I also notice you set other hyper-parameters like "pipeline.model.near-plane", "--scene-scale" in the commands. Can you give us some advice on how to set these hyper-parameters?
Great to see your results look good!
In terms of the near-plane and scene scale, those parameters were set because Nerfacto is primarily meant for unbounded, real-world scenes. However, the scene of this face is a front-facing scene, and so nerfacto creates a lot of floaters. I'd recommend going to the main nerfstudio issues section to get more tips on how to train for front-facing scenes!
I tried to generate some NeRF scenes using in2n with prompts described in the paper. And Here are some examples:
https://user-images.githubusercontent.com/42313652/233237010-a9afe882-5671-4120-9be4-ba5b0e831a02.mp4 (Give him a cowboy hat)
https://user-images.githubusercontent.com/42313652/233237024-640b2366-ae0b-41c6-9e81-54c58aa926fa.mp4 (Make him bald)
https://user-images.githubusercontent.com/42313652/233237040-e9f2002e-81fb-4a51-819b-d0b0a1734cec.mp4 (as a bronze bust)
I found the generated scenes' quality is not as good as the paper exhibits. I trained the NeRF using nerfacto and trained in2n using the default guidance scale following the document. Can you give me some advice on this? Thank you!