Interpreting the Output of the QuasiDense3DSepUncertainty Model

I have been attempting to utilize your model with full 3D monocular tracking on custom data, and for that I would like to make use of the inference api. Although I want to use custom data, I am currently trying to run and visualize the model on the nuscenes dataset to verify that the API is working correctly. I am using the included monocular 3D Detection/Tracking result for nuscenes from the model zoo with the corresponding QuasiDense3DSepUncertainty model.

In order to work with the nuscenes configuration of the model, I had to modify the img_meta created in the api during _prepare_data as shown below. I believe this is necessary because this api was originally intended for a different model configuration.

def _prepare_data(img, calib, pose, img_transform, cfg, device):
    ori_shape = img.shape
    img, img_shape, pad_shape, scale_factor = img_transform(
        img,
        scale=cfg.data.test.img_scale,
        keep_ratio=cfg.data.test.get('resize_keep_ratio', True))
    img = to_tensor(img).to(device).unsqueeze(0)
    img_meta = [
        dict(
            ori_shape=ori_shape,
            img_shape=img_shape,
            pad_shape=pad_shape,
            scale_factor=scale_factor,
            flip=False,
            calib=calib,
            pose=pose,
            img_info = dict(
                type="TRK",
                cali=calib,
                pose=pose
            )
        )
    ]
    return dict(img=[img], img_meta=[img_meta])

I am now attempting to perform a 3D visualization of the model output, basing my approach to the visualization based on the scripts/plot_tracking.py code. However, the resulting model output is not what I would expect it to be.

results, use_3d_center = inference_detector(model, img_path, calib, pose, nuscenes_categories)
print(len(results["depth_results"]))
print(len(results["alpha_results"]))
print(results["track_results"])

A common output of this code would look like this:

30
30
defaultdict(<class 'list'>, {0: {'bbox': array([ 427.682,  518.581,  446.410,  540.689,  0.056], dtype=float32), 'label': 8}})

My main issues stems from the fact that the track_results always seem to only include one item, but tools/general_output.py seems to imply that the number of items should be the same as the length of the other results(depth_results, alpha_results, ect).

I have found that associating the 3d information(depth_results, dim_results, alpha_results) with the 2d bbox information output by the model, I can get 3d bboxes that seem to be working to an extent, but not of the quality seen when using the inference and detection scripts that read from your converted dataset format. See some examples below:

In short, I would appreciate any insight into the direct usage of the QuasiDense3DSepUncertainty model, which doesn't seem to behave as expected when using the api provided in qd3dt/api/inference.py. It seems, based on the code used to run inference in tools/test_eval_video_exp.py and tools/general_output.py, that the track_results returned in the output should have more items, but instead it only outputs one item every time.

Is my assessment of the track_results output correct? What should the track_results output actually look like? Are there any assumptions that this inference API makes that would cause issues when attempting to use it with this model with full 3D tracking?

Thank you for your time and assistance.

SysCV / qd-3dt

Interpreting the Output of the QuasiDense3DSepUncertainty Model #30