JarrentWu1031 / SC4D

[ECCV 2024] Official code for: SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
82 stars 2 forks source link

Multi-View Generation #3

Open sauradip opened 2 months ago

sauradip commented 2 months ago

Hi ,

Thanks for making the work public. I find that your code works only when there is global movements, when the movement is local then the control points dont optimize well.

For the input video below

https://github.com/user-attachments/assets/7a9f28fc-9780-4945-812a-a97339064612

I am getting this output

https://github.com/user-attachments/assets/3ce19bb4-eb7e-43d2-8d0e-cf2432f88140

with the control poitns as this

https://github.com/user-attachments/assets/97131df2-9798-486b-b27c-0aafbc1bee90

which means the local deformation is hard to model with sparse control points. Any idea how to solve this ?

sauradip commented 2 months ago

Also i am getting very blurry results, i feel i need to increase the number of points, can you suggest what parameters in your code base can help me get better results

JarrentWu1031 commented 2 months ago

Hi ,

Thanks for making the work public. I find that your code works only when there is global movements, when the movement is local then the control points dont optimize well.

For the input video below

fan.mp4 I am getting this output

fan_0.mp4 with the control poitns as this

fan_cpts_0.mp4 which means the local deformation is hard to model with sparse control points. Any idea how to solve this ?

Hi, I think there are two main reasons that cause the issues you mentioned. On the one hand, I believe the segmentation result in this example might be incorrect. It is likely that the entire fan has been segmented as a whole without removing the gaps between the blades. This would cause the gaps between the blades to gradually turn into white 3DGS during the optimization process. On the other hand, the performance of our method largely depends on the distillation source (Zero123 here), which might fail to produce reasonable novel views for instances like this. You can randomly select a frame of this video and check if Zero123 handles it well in novel view synthesis.

To achieve better results in this example, I suggest you first check if the mask area in the segmentation result is correct. You might need a more powerful segmentor to handle certain examples, as this directly affects the areas where 3DGS and control points exist. Additionally, changing the distillation source might yield better results. However, I've previously tried multi-view generation networks like ImageDream, but their performance on the Consistent4D benchmark is not as good as Zero123xl.

JarrentWu1031 commented 2 months ago

Also i am getting very blurry results, i feel i need to increase the number of points, can you suggest what parameters in your code base can help me get better results

Is this the example you showed above? Or can you provide the blurry results so I can better figure out the cause of the blurriness.