Reproducing Kitchen Scene Image --> URDF Example

cremebrule commented 2 months ago

Hi,

Huge thanks for publishing this work and making the code public -- this is a really impressive work and has a lot of potential use cases for scaling up data generation!

I'm trying to reproduce the published results; notably, the showcased kitchen scene. However, I'm unable to produce good results. Even after manually drawing all bounding boxes around each cabinet, it seems the output is still poor (shown below), and the texture a bit mismatched. I've run it a few times but can't seem to improve the quality of the output. Any tips / tricks for getting the outputs to be improved?

Huge thanks!!

kitchen_urdformer output

qiuyuchen14 commented 2 months ago

Hi Josiah, thanks so much for your interest in URDFormer! To help you reproduce the examples shown in our paper, I added a new script (see Evaluation in README). In particular, I uploaded all the assets used in URDFormer evaluation. When you run evaluate.py, URDFormer will use the manually labeled bboxes as the input. If you want to compare the results with manually labeled URDFs, please check out Reality Gym. Screenshot from 2024-08-08 00-02-20 For example: Left is the original image, the middle is URDFormer prediction when running evaluate.py, and the right is the labeled URDF when run python gt_demo.py --scene kitchens --texture.

To answer your questions regarding tips/tricks in general. What I found helpful is by reducing the number of objects bbox in the scene by grouping adjacent simple cabinets together. This is because my kitchen training data usually has fewer objects than the real images (I randomly scaled each object by 1-4 times, resulting in wider objects in the global scene during training), while my cabinet training set contains pretty diverse configurations. So I usually leave more complex configuration reasoning to URDFormer part and reduce the burden on the global URDFormer.

In terms of texture, make sure you choose --scene_type kitchen when running get_texture.py. In the evaluation.py, you can also comment out here to reproduce how I saved kitchen texture map.

URDFormer definitely still needs a lot of improvements, especially on global scene prediction, this is likely due to the resolution of each cropped image of the object in the scene being much worse when feeding into part URDFormer, its also likely because objects in the global scenes have more diverse angles.

cremebrule commented 2 months ago

Awesome, huge thanks for the rapid response! This is very helpful information. I will try the evaluate method too.

Regarding bboxes, I tried using that same method, grouping cabinets that are symmetric with respect to each other or seem to operate as a semantic "group". I'm not sure why the output doesn't look as good as yours :/

Regarding texture, I made sure to run with the --scene_type kitchen flag. In fact, for all three steps (get_bbox.py, get_texture.py, and demo.py), I ran with --scene_type kitchen.

The limitations also make sense. More objects are spatially out of distribution, and object boundaries can be a bit more unclear.

One follow-up question: Do you happen to have the other original high-res internet photos + phone photos you took to generate the results shown the URDFormer website? I'd like to also run URDFormer against those images too to get a better feel for the qualitative outputs!

Huge thanks again!!

qiuyuchen14 commented 2 months ago

The asset should contain all the 54 internet kitchen images and all 300 images for 5 object categories. If you run evaluate.py, it will show the urdformer prediction (with labelled bbox) for all 54 internet images one by one including those images on the website.

cremebrule commented 2 months ago

Got it, huge thanks!

WEIRDLabUW / urdformer

Reproducing Kitchen Scene Image --> URDF Example #3