Open RongkunYang opened 1 week ago
OK, thank you very much, I would like to use the model to test the CAD retrieval, and the embedding have been saved in huggingface right?
Yes, if you only want to retrieve objects based on text or images, the embeddings are available on huggingface. You can see text_retrieval.py
for an idea on how to do this.
OK, I have tried the text retrieval, and this performs good. However, I notice that the text_retrieval only performs on Objaverse dataset, and the other dataset such 3D Future and ABO does not take into accout, could you share the shape_embed file for the other dataset, or how could we process the shape_embed file?
And I have another question, could the DuoduoCLIP model performs image retrieve 3D models, how could I implement this
For the image to text retrieval you can follow the Example in the readme to generate embeddings for single or multi-view images and use that as the query. Then you can plug that query into the text_retrieval.py script as the query embedding to search over the objaverse embeddings.
I'm currently busy with a few deadlines, but I'll also include the 3D Future, ABO, and ShapeNet embeddings in a future release for the retrieval part. In the mean time, the renderings for 3D Future, ABO, and ShapeNet are already in the huggingface repo. So it is possible to just take those and get the embeddings with the Example.
OK, got it, the rendered image of 3D Future, ABO, and ShapeNet is stored in the supplement_mv_images.h5, right?
Also, I download the data in "dataset/data/ViT-B-32_laion2b_s34b_b79k/image_embeddings.h5" and "dataset/data/ViT-B-32_laion2b_s34b_b79k/text_embeddings.npy", may I ask how the image_embeddings and text_embeddings can be used to construct the shape embedding. Or in other words, in the text_retrieval.py, we utilize the "data/objaverse_embeddings/Four_1to6F_bs1600_LT6/shape_emb_objaverse.h5" as the embedding database, how can we obtain the shape embedding with the above image_embeddings and text embeddings?
You should be able to obtain the shaped embeddings with just the supplement_mv_images.h5 and supplement_model_to_idx.json (model identifiers to h5 indices) files. The example shows how to encode the raw multi-view images into a shape embedding.
The text and image embeddings under dataset/data/ViT-B-32_laion2b_s34b_b79k
are not shape embeddings. They are embeddings obtained by using the pretrained CLIP model ViT-B-32_laion2b_s34b_b79k
used as the teacher model for training DuoduoCLIP. The image embeddings here are processed separately by the pretrained CLIP model not the DuoduoCLIP model.
OK, thank you very much, the shape embedding is encoded by the DuoDuoCLIP image encoder
Hi,
The images provided in the huggingface only contains rendered images for the Objaverse-LVIS split. So if you only plan on running evaluation then it is enough. However, if you want to do training then downloading the full Zero123 images are needed.
Hope that helps!