GAP-LAB-CUHK-SZ / Total3DUnderstanding

Implementation of CVPR'20 Oral: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
MIT License
415 stars 50 forks source link

One Different Image Layout Estimation and Drawing 3D Layout #36

Open emirhanKural opened 3 years ago

emirhanKural commented 3 years ago

Hi, I really appreciate the project and hope it can be developed more :)

Now, I'm trying to do only layout_estimation. My purpose is to give an image and take its layout 3D image. Like this : image image

First problem is that how can cam_K be estimated ? I have check out all code samples. I can could layout and cam_R estimation. In all your samples you use cam_K of data to draw 3D layout. How can I predict it or is there any way to draw 3D without cam_K.

Second problem is that I don't know I am doing correctly but when I tried to estimate layouts of demo datas, my results were really bad. I used demo.py steps to predict layout points. For weight, I used your pretained_model firstly, then I trained 100 epochs and tried its weight. But the results was same.

I used here @chengzhag's layout_estimation.yaml

def estimate(img_path):
    cfg = CONFIG("configs/layout_estimation.yaml",)
    checkpoint = CheckpointIO(cfg)
    cfg = mount_external_config(cfg)
    device = load_device(cfg)
    cfg.config["mode"] = "demo"
    net = load_model(cfg, device=device)
    checkpoint.register_modules(net=net)

    cfg.config['demo_path'] = img_path
    data = load_demo_data(cfg.config['demo_path'], device)

    with torch.no_grad():
        est_data = net(data)

    lo_bdb3D_out = get_layout_bdb_sunrgbd(cfg.bins_tensor, est_data['lo_ori_reg_result'],
                                          torch.argmax(est_data['lo_ori_cls_result'], 1),
                                          est_data['lo_centroid_result'],
                                          est_data['lo_coeffs_result'])
    layout = lo_bdb3D_out[0,:,:].cpu().numpy()

    cam_R_out = get_rotation_matix_result(cfg.bins_tensor,
                                          torch.argmax(est_data['pitch_cls_result'], 1), est_data['pitch_reg_result'],
                                          torch.argmax(est_data['roll_cls_result'], 1), est_data['roll_reg_result'])
    pre_cam_R = cam_R_out[0, :, :].cpu().numpy()

    pre_layout = format_layout(layout)

    return pre_layout, pre_cam_R

To draw 3D layout : (I'm getting cam_K of the sample. Not shown here)

img_path = "./demo/inputs/1"
sequence_id = img_path[-1]   

rgb_image = np.asarray(Image.open(img_path+"/img.jpg").convert('RGB'))
pre_layout , pre_cam_R = estimate(img_path)
scene_box = Box(rgb_image, None, cam_K, None, pre_cam_R, None,
                pre_layout, None, None, 'prediction', None)

scene_box.draw3D(if_save=True, save_path = './demo/sunrgbd/%s_recon.png' % (sequence_id))

I got results like this: image

It should seem like this : image

I hope that I could express myself clearly. Thank very much^^

yinyunie commented 3 years ago

Hi,

In the paper, we actually ask for camera intrinsics (i.e., cam_K), otherwise, this problem would be extremely ambiguous.

For the layout estimation in our demo, our prediction is here. The figure is exactly produced by our demo code.

Best, Yinyu