facebookresearch / vfusion3d

[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Other
386 stars 26 forks source link

[question] Integrate to HuggingFace Hub #1

Closed jadechoghari closed 1 month ago

jadechoghari commented 1 month ago

The model looks really cool! :) Is it possible to add it to the Hugging Face Hub? (I can do the work, with the help of HF folks) It’s a must! 🤗

JunlinHan commented 1 month ago

Hello Jade, Thank you so much for your kind words! Yes, it would definitely be nice if an HF hub is added. I would be more than happy to provide any help regarding this!

jadechoghari commented 1 month ago

Thank you, Junlin!

Great that you're open to adding it to the Hugging Face Hub. Will start working on it and will definitely reach out if I need any assistance from you. Looking forward to making this happen! 🤗

JunlinHan commented 1 month ago

Thanks a lot Jade, that's terrific!

jadechoghari commented 1 month ago

Hey @JunlinHan, been working on the models for a few days. Do you have a configs folder, like the one here: https://github.com/3DTopia/OpenLRM?

:)

Also, regarding the inference script inferrer.py : I'm able to load only one image after modifying the code to accept a single image. It worked for generating the mesh, but converting it to a video seems to require more memory/gpu somehow.

JunlinHan commented 1 month ago

Hey @JunlinHan, been working on the models for a few days. Do you have a configs folder, like the one here: https://github.com/3DTopia/OpenLRM?

:)

Also, regarding the inference script inferrer.py : I'm able to load only one image after modifying the code to accept a single image. It worked for generating the mesh, but converting it to a video seems to require more memory/gpu somehow.

Hey Jade, thanks a lot for the info! Config: No, we only provide one model (unlike OpenLRM, which has multiple models). The default setting should provide good performance overall, though --render_size, --mesh_size, and camera parameters for input images can be changed.

Single image -> video: Yes, a single image would definitely be better for a demo. For GPU memory, maybe try reducing chunk_size (line 120 https://github.com/facebookresearch/vfusion3d/blob/main/lrm/inferrer.py#L120) to 1? This should help! If that's still not enough, reducing --render_size to 256 should also help!

jadechoghari commented 1 month ago

Yes, sounds good, thanks!

So the model does not contain a config?

JunlinHan commented 1 month ago

Yea no config is contained.

jadechoghari commented 1 month ago

Hey! @JunlinHan, 👋 Other question: regarding the config file could you clarify more about it, saw that the model is created based on the LRMGenerator class, so is the config embedded in the class or passed during instantiation? If not, could you provide a detailed config file? details like:

. model architecture (layers, hidden sizes) - tokenizer settings (vocabulary size, special tokens)- other model-specific settings (attention mechanisms, activation functions) etc..

Ideally, this should be in a config.json or .yaml file.

All I see now is that the model is initiated using _model_kwargs

thanks!

JunlinHan commented 1 month ago

Hey Jade,

The config (though without a real config yaml or json file) is in line 28: https://github.com/facebookresearch/vfusion3d/blob/main/lrm/inferrer.py#L28

Feel free to create one? Hope it helps!

jadechoghari commented 1 month ago

yup!

jadechoghari commented 1 month ago

Here’s the corrected and refined version of the model (though not finished):

The model is now ready, currently outputting only planes, but it's easily usable!

from transformers import AutoModel
model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True)

# dummy input
image_tensor = torch.randn(1, 3, 224, 224) 
camera_tensor = torch.randn(1, 16) 

output_planes = model(image_tensor, camera_tensor)
print(output_planes.shape)

I'm working on the next part: outputting meshes and videos, and making it easier for users to input images with a nice gradio interface.

apologies for the llrm ->lrm typo; though the model is not fully ready to be merged with facebook anyways :)

Will keep you updated!

jadechoghari commented 1 month ago

https://huggingface.co/jadechoghari/custom-llrm

Can click on "Use this model" button

JunlinHan commented 1 month ago

Here’s the corrected and refined version of the model (though not finished):

The model is now ready, currently outputting only planes, but it's easily usable!

from transformers import AutoModel
model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True)

# dummy input
image_tensor = torch.randn(1, 3, 224, 224) 
camera_tensor = torch.randn(1, 16) 

output_planes = model(image_tensor, camera_tensor)
print(output_planes.shape)

I'm working on the next part: outputting meshes and videos, and making it easier for users to input images with a nice gradio interface.

apologies for the llrm ->lrm typo; though the model is not fully ready to be merged with facebook anyways :)

Will keep you updated!

Wow thanks! That's really cool.

For GPUs I might be able to merge it with facebook's hf account so it will have a100 support there!

jadechoghari commented 1 month ago

great!

jadechoghari commented 1 month ago

The model is nearly complete! For the output when it is called from transformers, do you prefer it to output planes, meshes, or both, or would you prefer to have an option to choose?

from transformers import AutoModel
model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True)

# dummy input (will be real in implementation)
image_tensor = torch.randn(1, 3, 224, 224) 
camera_tensor = torch.randn(1, 16) 

# if planes
output_planes = model(image_tensor, camera_tensor)

# if mesh
mesh_obj = model(image_tensor, camera_tensor)

# if both
output_planes, mesh_obj = model(image_tensor, camera_tensor)

# we can also add an option: output_type

Also the demo is complete 🎊: but slow on cpu 🥲 right now: https://huggingface.co/spaces/jadechoghari/vfusion3d-app (when we'll merge with facebook it should work) If you want to test it on high gpu, you could just copy the app.py and run it on a notebook.

JunlinHan commented 1 month ago

That’s really cool!

The planes should be generated before the meshes, and planes can be used for both video results and meshes. Therefore, it would be best to have planes first, with options to choose between video outputs or mesh outputs.

Wow, I will check how this can be integrated into fb’s HF account!

jadechoghari commented 1 month ago

alright, sounds good!

jadechoghari commented 1 month ago

Model is done and accessible to all! You can check it out here: https://huggingface.co/jadechoghari/vfusion3d And the demo app here: https://huggingface.co/spaces/jadechoghari/vfusion3d-app (will work on a100s)

Let me know when the merge will happen, and we'll update the README to include the Facebook repository (instead of my username).

We will add a note that it is advisable to choose export mesh (video takes a lot of gpu). Ready for the merge!

JunlinHan commented 1 month ago

Hey Jade,

Thank you once again for your amazing contribution!

I've created a private demo in Meta's account, and I'm currently testing it.

image

It looks like the 3D model visualization isn't working in my browser (Chrome). I'm not sure if there's anything I need to do about this (if it is working on your side then it's probably fine!).

Also, since A100 is available now, I think it would be better to add video results back.

One more thing I think would be cool is to provide some default images (we can use 10 images from the 40 prompt images provided in this GitHub demo), so it looks like this one:

image

If you have the bandwidth, could you please take a look? I can also do it myself!

Thank you again!

jadechoghari commented 1 month ago

That’s great I’ll have a look ! How long it did to create the .obj mesh on the a100s?

and yes we can certainly add the demo images!

jadechoghari commented 1 month ago

and for the bug mesh not outputting, i tried it on l4 it seems fast but you need to wait an extra 5-7 seconds so the mesh appears on gradio - :) we can experiment on faster rendering of .obj file though: Screenshot 2024-08-06 at 10 03 49 AM

JunlinHan commented 1 month ago

and for the bug mesh not outputting, i tried it on l4 it seems fast but you need to wait an extra 5-7 seconds so the mesh appears on gradio - :) we can experiment on faster rendering of .obj file though:

Ah I see, that's interesting!

On A100 it takes less than less than 15s to create obj file (with 512 mesh_size). I then waited there for 5 min sbut 3d model visualisation is still not shown. This might be related to my browser as I tried some other 3D gradio demos where things are similar.

jadechoghari commented 1 month ago

Yes, it works on my end. Exporting as GLB can also speed it up. We can later reach ping gradio maintainers if we encounter any appearance issues.

Regarding the export-video feature, all functionalities are complete on my end; it's just a GPU issue (allocated max memory).

I'll also add some optimizations to the current gradio demo. I think we're good to go after that?

jadechoghari commented 1 month ago

you can also just make the model visible on hf: https://huggingface.co/jadechoghari/vfusion3d and hide the demo until it's ready (almost complete though)

JunlinHan commented 1 month ago

Yes, it works on my end. Exporting as GLB can also speed it up. We can later reach ping gradio maintainers if we encounter any appearance issues.

Regarding the export-video feature, all functionalities are complete on my end; it's just a GPU issue (allocated max memory).

I'll also add some optimizations to the current gradio demo. I think we're good to go after that?

Sure! Well it might be related to something on my end, let's ignore it for now! Definitely good to go!

jadechoghari commented 1 month ago

sounds good!, for the model you're good to go to make it public on facebook repo! I'll just do some fixes in gradio from my end and the demo should be ready tomorrow

JunlinHan commented 1 month ago

sounds good!, for the model you're good to go to make it public on facebook repo! I'll just do some fixes in gradio from my end and the demo should be ready tomorrow

Perfect, thanks so much! Will make them public together!

jadechoghari commented 1 month ago

ok all good and done. It also somehow worked on cpu lol. https://huggingface.co/spaces/jadechoghari/vfusion3d-app Screenshot 2024-08-07 at 2 35 15 PM

jadechoghari commented 1 month ago

VFusion3D is officially in Hugging Face 🤗! Big thanks to the authors for the paper!

yvrjsharma commented 1 month ago

Hi @JunlinHan and @jadechoghari, great work and congratulations on the brilliant release!! It would be great to have this demo locally available on the repository and linked in the readme. For example, like for Stable-fast-3D here: https://github.com/Stability-AI/stable-fast-3d?tab=readme-ov-file#local-gradio-app

jadechoghari commented 1 month ago

thanks! Yes it can easily be added. @JunlinHan I'll open a PR for it

yvrjsharma commented 1 month ago

Awesome, thanks for all the support @jadechoghari ! 🙌