Closed jadechoghari closed 1 month ago
Hello Jade, Thank you so much for your kind words! Yes, it would definitely be nice if an HF hub is added. I would be more than happy to provide any help regarding this!
Thank you, Junlin!
Great that you're open to adding it to the Hugging Face Hub. Will start working on it and will definitely reach out if I need any assistance from you. Looking forward to making this happen! 🤗
Thanks a lot Jade, that's terrific!
Hey @JunlinHan, been working on the models for a few days. Do you have a configs folder, like the one here: https://github.com/3DTopia/OpenLRM?
:)
Also, regarding the inference script inferrer.py : I'm able to load only one image after modifying the code to accept a single image. It worked for generating the mesh, but converting it to a video seems to require more memory/gpu somehow.
Hey @JunlinHan, been working on the models for a few days. Do you have a configs folder, like the one here: https://github.com/3DTopia/OpenLRM?
:)
Also, regarding the inference script inferrer.py : I'm able to load only one image after modifying the code to accept a single image. It worked for generating the mesh, but converting it to a video seems to require more memory/gpu somehow.
Hey Jade, thanks a lot for the info! Config: No, we only provide one model (unlike OpenLRM, which has multiple models). The default setting should provide good performance overall, though --render_size, --mesh_size, and camera parameters for input images can be changed.
Single image -> video: Yes, a single image would definitely be better for a demo. For GPU memory, maybe try reducing chunk_size (line 120 https://github.com/facebookresearch/vfusion3d/blob/main/lrm/inferrer.py#L120) to 1? This should help! If that's still not enough, reducing --render_size to 256 should also help!
Yes, sounds good, thanks!
So the model does not contain a config?
Yea no config is contained.
Hey! @JunlinHan, 👋 Other question: regarding the config file could you clarify more about it, saw that the model is created based on the LRMGenerator class, so is the config embedded in the class or passed during instantiation? If not, could you provide a detailed config file? details like:
. model architecture (layers, hidden sizes) - tokenizer settings (vocabulary size, special tokens)- other model-specific settings (attention mechanisms, activation functions) etc..
Ideally, this should be in a config.json or .yaml file.
All I see now is that the model is initiated using _model_kwargs
thanks!
Hey Jade,
The config (though without a real config yaml or json file) is in line 28: https://github.com/facebookresearch/vfusion3d/blob/main/lrm/inferrer.py#L28
Feel free to create one? Hope it helps!
yup!
Here’s the corrected and refined version of the model (though not finished):
The model is now ready, currently outputting only planes, but it's easily usable!
from transformers import AutoModel
model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True)
# dummy input
image_tensor = torch.randn(1, 3, 224, 224)
camera_tensor = torch.randn(1, 16)
output_planes = model(image_tensor, camera_tensor)
print(output_planes.shape)
I'm working on the next part: outputting meshes and videos, and making it easier for users to input images with a nice gradio interface.
apologies for the llrm ->lrm typo; though the model is not fully ready to be merged with facebook anyways :)
Will keep you updated!
https://huggingface.co/jadechoghari/custom-llrm
Can click on "Use this model" button
Here’s the corrected and refined version of the model (though not finished):
The model is now ready, currently outputting only planes, but it's easily usable!
from transformers import AutoModel model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True) # dummy input image_tensor = torch.randn(1, 3, 224, 224) camera_tensor = torch.randn(1, 16) output_planes = model(image_tensor, camera_tensor) print(output_planes.shape)
I'm working on the next part: outputting meshes and videos, and making it easier for users to input images with a nice gradio interface.
apologies for the llrm ->lrm typo; though the model is not fully ready to be merged with facebook anyways :)
Will keep you updated!
Wow thanks! That's really cool.
For GPUs I might be able to merge it with facebook's hf account so it will have a100 support there!
great!
The model is nearly complete!
For the output when it is called from transformers, do you prefer it to output planes
, meshes
, or both, or would you prefer to have an option to choose?
from transformers import AutoModel
model = AutoModel.from_pretrained("jadechoghari/custom-llrm", trust_remote_code=True)
# dummy input (will be real in implementation)
image_tensor = torch.randn(1, 3, 224, 224)
camera_tensor = torch.randn(1, 16)
# if planes
output_planes = model(image_tensor, camera_tensor)
# if mesh
mesh_obj = model(image_tensor, camera_tensor)
# if both
output_planes, mesh_obj = model(image_tensor, camera_tensor)
# we can also add an option: output_type
Also the demo is complete 🎊: but slow on cpu 🥲 right now: https://huggingface.co/spaces/jadechoghari/vfusion3d-app (when we'll merge with facebook it should work) If you want to test it on high gpu, you could just copy the app.py and run it on a notebook.
That’s really cool!
The planes should be generated before the meshes, and planes can be used for both video results and meshes. Therefore, it would be best to have planes first, with options to choose between video outputs or mesh outputs.
Wow, I will check how this can be integrated into fb’s HF account!
alright, sounds good!
Model is done and accessible to all! You can check it out here: https://huggingface.co/jadechoghari/vfusion3d And the demo app here: https://huggingface.co/spaces/jadechoghari/vfusion3d-app (will work on a100s)
Let me know when the merge will happen, and we'll update the README to include the Facebook repository (instead of my username).
We will add a note that it is advisable to choose export mesh (video takes a lot of gpu). Ready for the merge!
Hey Jade,
Thank you once again for your amazing contribution!
I've created a private demo in Meta's account, and I'm currently testing it.
It looks like the 3D model visualization isn't working in my browser (Chrome). I'm not sure if there's anything I need to do about this (if it is working on your side then it's probably fine!).
Also, since A100 is available now, I think it would be better to add video results back.
One more thing I think would be cool is to provide some default images (we can use 10 images from the 40 prompt images provided in this GitHub demo), so it looks like this one:
If you have the bandwidth, could you please take a look? I can also do it myself!
Thank you again!
That’s great I’ll have a look ! How long it did to create the .obj mesh on the a100s?
and yes we can certainly add the demo images!
and for the bug mesh not outputting, i tried it on l4 it seems fast but you need to wait an extra 5-7 seconds so the mesh appears on gradio - :) we can experiment on faster rendering of .obj file though:
and for the bug mesh not outputting, i tried it on l4 it seems fast but you need to wait an extra 5-7 seconds so the mesh appears on gradio - :) we can experiment on faster rendering of .obj file though:
Ah I see, that's interesting!
On A100 it takes less than less than 15s to create obj file (with 512 mesh_size). I then waited there for 5 min sbut 3d model visualisation is still not shown. This might be related to my browser as I tried some other 3D gradio demos where things are similar.
Yes, it works on my end. Exporting as GLB can also speed it up. We can later reach ping gradio maintainers if we encounter any appearance issues.
Regarding the export-video feature, all functionalities are complete on my end; it's just a GPU issue (allocated max memory).
I'll also add some optimizations to the current gradio demo. I think we're good to go after that?
you can also just make the model visible on hf: https://huggingface.co/jadechoghari/vfusion3d and hide the demo until it's ready (almost complete though)
Yes, it works on my end. Exporting as GLB can also speed it up. We can later reach ping gradio maintainers if we encounter any appearance issues.
Regarding the export-video feature, all functionalities are complete on my end; it's just a GPU issue (allocated max memory).
I'll also add some optimizations to the current gradio demo. I think we're good to go after that?
Sure! Well it might be related to something on my end, let's ignore it for now! Definitely good to go!
sounds good!, for the model you're good to go to make it public on facebook repo! I'll just do some fixes in gradio from my end and the demo should be ready tomorrow
sounds good!, for the model you're good to go to make it public on facebook repo! I'll just do some fixes in gradio from my end and the demo should be ready tomorrow
Perfect, thanks so much! Will make them public together!
ok all good and done. It also somehow worked on cpu lol. https://huggingface.co/spaces/jadechoghari/vfusion3d-app
VFusion3D is officially in Hugging Face 🤗! Big thanks to the authors for the paper!
Hi @JunlinHan and @jadechoghari, great work and congratulations on the brilliant release!! It would be great to have this demo locally available on the repository and linked in the readme. For example, like for Stable-fast-3D here: https://github.com/Stability-AI/stable-fast-3d?tab=readme-ov-file#local-gradio-app
thanks! Yes it can easily be added. @JunlinHan I'll open a PR for it
Awesome, thanks for all the support @jadechoghari ! 🙌
The model looks really cool! :) Is it possible to add it to the Hugging Face Hub? (I can do the work, with the help of HF folks) It’s a must! 🤗