facebookresearch / NeRF-Det

[ICCV 2023] Code for NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
https://chenfengxu714.github.io/nerfdet/
Other
279 stars 18 forks source link

Depth makes performance worse #13

Open mrsempress opened 11 months ago

mrsempress commented 11 months ago

The performance I implemented of nerfdet_res50_2x_low_res is 52.4, but nerfdet_res50_2x_low_res_depth_sp is 49.58. In the paper, the performance of nerfdet_res50_2x_low_res is 52.0, but nerfdet_res50_2x_low_res_depth_sp is 51.8. It seems the depth does not work and even disrupts the performance.

chenfengxu714 commented 11 months ago

Do you modify anything? If using the config i provided, it should reproduce the results i list in GitHub or paper. Do you raise resolutions?

mrsempress commented 11 months ago

I did not modify anything. The resolutions are as follows: The best mAP 0.25 is epoch 11:

2023-09-19 16:23:15,981 - mmdet - INFO - Saving checkpoint at 11 epochs
2023-09-19 16:31:29,307 - mmdet - INFO - 
+----------------+---------+---------+---------+---------+
| classes        | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 |
+----------------+---------+---------+---------+---------+
| table          | 0.5533  | 0.7229  | 0.3770  | 0.5029  |
| sofa           | 0.7290  | 0.8351  | 0.4794  | 0.5876  |
| chair          | 0.7393  | 0.8158  | 0.4466  | 0.5307  |
| bookshelf      | 0.5025  | 0.7532  | 0.1858  | 0.3506  |
| curtain        | 0.2499  | 0.5075  | 0.0586  | 0.1791  |
| garbagebin     | 0.3039  | 0.5528  | 0.1125  | 0.2434  |
| door           | 0.3325  | 0.5782  | 0.0595  | 0.1884  |
| picture        | 0.0363  | 0.1486  | 0.0116  | 0.0450  |
| cabinet        | 0.3431  | 0.6398  | 0.1251  | 0.2769  |
| window         | 0.2225  | 0.4787  | 0.0258  | 0.1241  |
| bed            | 0.8010  | 0.8519  | 0.6727  | 0.7284  |
| showercurtrain | 0.3742  | 0.7143  | 0.0469  | 0.1429  |
| desk           | 0.7159  | 0.9134  | 0.4474  | 0.6378  |
| counter        | 0.4279  | 0.5769  | 0.0469  | 0.1731  |
| refrigerator   | 0.5198  | 0.6316  | 0.2508  | 0.3860  |
| sink           | 0.5148  | 0.5816  | 0.2507  | 0.3367  |
| bathtub        | 0.6672  | 0.7742  | 0.3987  | 0.4516  |
| toilet         | 0.8918  | 0.9310  | 0.5667  | 0.6207  |
+----------------+---------+---------+---------+---------+
| Overall        | 0.4958  | 0.6671  | 0.2535  | 0.3614  |
+----------------+---------+---------+---------+---------+
2023-09-19 16:31:29,345 - mmdet - INFO - Epoch(val) [11][1802]  table_AP_0.25: 0.5533, sofa_AP_0.25: 0.7290, chair_AP_0.25: 0.7393, bookshelf_AP_0.25: 0.5025, curtain_AP_0.25: 0.2499, garbagebin_AP_0.25: 0.3039, door_AP_0.25: 0.3325, picture_AP_0.25: 0.0363, cabinet_AP_0.25: 0.3431, window_AP_0.25: 0.2225, bed_AP_0.25: 0.8010, showercurtrain_AP_0.25: 0.3742, desk_AP_0.25: 0.7159, counter_AP_0.25: 0.4279, refrigerator_AP_0.25: 0.5198, sink_AP_0.25: 0.5148, bathtub_AP_0.25: 0.6672, toilet_AP_0.25: 0.8918, mAP_0.25: 0.4958, table_rec_0.25: 0.7229, sofa_rec_0.25: 0.8351, chair_rec_0.25: 0.8158, bookshelf_rec_0.25: 0.7532, curtain_rec_0.25: 0.5075, garbagebin_rec_0.25: 0.5528, door_rec_0.25: 0.5782, picture_rec_0.25: 0.1486, cabinet_rec_0.25: 0.6398, window_rec_0.25: 0.4787, bed_rec_0.25: 0.8519, showercurtrain_rec_0.25: 0.7143, desk_rec_0.25: 0.9134, counter_rec_0.25: 0.5769, refrigerator_rec_0.25: 0.6316, sink_rec_0.25: 0.5816, bathtub_rec_0.25: 0.7742, toilet_rec_0.25: 0.9310, mAR_0.25: 0.6671, table_AP_0.50: 0.3770, sofa_AP_0.50: 0.4794, chair_AP_0.50: 0.4466, bookshelf_AP_0.50: 0.1858, curtain_AP_0.50: 0.0586, garbagebin_AP_0.50: 0.1125, door_AP_0.50: 0.0595, picture_AP_0.50: 0.0116, cabinet_AP_0.50: 0.1251, window_AP_0.50: 0.0258, bed_AP_0.50: 0.6727, showercurtrain_AP_0.50: 0.0469, desk_AP_0.50: 0.4474, counter_AP_0.50: 0.0469, refrigerator_AP_0.50: 0.2508, sink_AP_0.50: 0.2507, bathtub_AP_0.50: 0.3987, toilet_AP_0.50: 0.5667, mAP_0.50: 0.2535, table_rec_0.50: 0.5029, sofa_rec_0.50: 0.5876, chair_rec_0.50: 0.5307, bookshelf_rec_0.50: 0.3506, curtain_rec_0.50: 0.1791, garbagebin_rec_0.50: 0.2434, door_rec_0.50: 0.1884, picture_rec_0.50: 0.0450, cabinet_rec_0.50: 0.2769, window_rec_0.50: 0.1241, bed_rec_0.50: 0.7284, showercurtrain_rec_0.50: 0.1429, desk_rec_0.50: 0.6378, counter_rec_0.50: 0.1731, refrigerator_rec_0.50: 0.3860, sink_rec_0.50: 0.3367, bathtub_rec_0.50: 0.4516, toilet_rec_0.50: 0.6207, mAR_0.50: 0.3614

And epoch 12 is :

2023-09-19 17:55:36,242 - mmdet - INFO - Saving checkpoint at 12 epochs
2023-09-19 18:03:43,898 - mmdet - INFO - 
+----------------+---------+---------+---------+---------+
| classes        | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 |
+----------------+---------+---------+---------+---------+
| table          | 0.5501  | 0.7114  | 0.3603  | 0.4714  |
| chair          | 0.7272  | 0.8070  | 0.4297  | 0.5161  |
| sofa           | 0.7351  | 0.8247  | 0.4211  | 0.5361  |
| bookshelf      | 0.4756  | 0.7273  | 0.1782  | 0.3377  |
| curtain        | 0.2039  | 0.4328  | 0.0366  | 0.1493  |
| garbagebin     | 0.3135  | 0.5302  | 0.1044  | 0.2170  |
| door           | 0.3273  | 0.5546  | 0.0738  | 0.1906  |
| picture        | 0.0253  | 0.1216  | 0.0028  | 0.0315  |
| cabinet        | 0.3308  | 0.6022  | 0.1158  | 0.2446  |
| window         | 0.2102  | 0.4539  | 0.0117  | 0.1028  |
| bed            | 0.8080  | 0.8765  | 0.6820  | 0.7407  |
| showercurtrain | 0.3415  | 0.6786  | 0.0661  | 0.1429  |
| desk           | 0.7097  | 0.9213  | 0.4788  | 0.6614  |
| counter        | 0.4099  | 0.6346  | 0.0321  | 0.1346  |
| refrigerator   | 0.5390  | 0.6667  | 0.2349  | 0.3860  |
| sink           | 0.5386  | 0.6224  | 0.2465  | 0.3163  |
| toilet         | 0.8885  | 0.9310  | 0.5742  | 0.6379  |
| bathtub        | 0.6791  | 0.7419  | 0.4183  | 0.4839  |
+----------------+---------+---------+---------+---------+
| Overall        | 0.4896  | 0.6577  | 0.2482  | 0.3500  |
+----------------+---------+---------+---------+---------+
2023-09-19 18:03:43,924 - mmdet - INFO - Epoch(val) [12][1802]  table_AP_0.25: 0.5501, chair_AP_0.25: 0.7272, sofa_AP_0.25: 0.7351, bookshelf_AP_0.25: 0.4756, curtain_AP_0.25: 0.2039, garbagebin_AP_0.25: 0.3135, door_AP_0.25: 0.3273, picture_AP_0.25: 0.0253, cabinet_AP_0.25: 0.3308, window_AP_0.25: 0.2102, bed_AP_0.25: 0.8080, showercurtrain_AP_0.25: 0.3415, desk_AP_0.25: 0.7097, counter_AP_0.25: 0.4099, refrigerator_AP_0.25: 0.5390, sink_AP_0.25: 0.5386, toilet_AP_0.25: 0.8885, bathtub_AP_0.25: 0.6791, mAP_0.25: 0.4896, table_rec_0.25: 0.7114, chair_rec_0.25: 0.8070, sofa_rec_0.25: 0.8247, bookshelf_rec_0.25: 0.7273, curtain_rec_0.25: 0.4328, garbagebin_rec_0.25: 0.5302, door_rec_0.25: 0.5546, picture_rec_0.25: 0.1216, cabinet_rec_0.25: 0.6022, window_rec_0.25: 0.4539, bed_rec_0.25: 0.8765, showercurtrain_rec_0.25: 0.6786, desk_rec_0.25: 0.9213, counter_rec_0.25: 0.6346, refrigerator_rec_0.25: 0.6667, sink_rec_0.25: 0.6224, toilet_rec_0.25: 0.9310, bathtub_rec_0.25: 0.7419, mAR_0.25: 0.6577, table_AP_0.50: 0.3603, chair_AP_0.50: 0.4297, sofa_AP_0.50: 0.4211, bookshelf_AP_0.50: 0.1782, curtain_AP_0.50: 0.0366, garbagebin_AP_0.50: 0.1044, door_AP_0.50: 0.0738, picture_AP_0.50: 0.0028, cabinet_AP_0.50: 0.1158, window_AP_0.50: 0.0117, bed_AP_0.50: 0.6820, showercurtrain_AP_0.50: 0.0661, desk_AP_0.50: 0.4788, counter_AP_0.50: 0.0321, refrigerator_AP_0.50: 0.2349, sink_AP_0.50: 0.2465, toilet_AP_0.50: 0.5742, bathtub_AP_0.50: 0.4183, mAP_0.50: 0.2482, table_rec_0.50: 0.4714, chair_rec_0.50: 0.5161, sofa_rec_0.50: 0.5361, bookshelf_rec_0.50: 0.3377, curtain_rec_0.50: 0.1493, garbagebin_rec_0.50: 0.2170, door_rec_0.50: 0.1906, picture_rec_0.50: 0.0315, cabinet_rec_0.50: 0.2446, window_rec_0.50: 0.1028, bed_rec_0.50: 0.7407, showercurtrain_rec_0.50: 0.1429, desk_rec_0.50: 0.6614, counter_rec_0.50: 0.1346, refrigerator_rec_0.50: 0.3860, sink_rec_0.50: 0.3163, toilet_rec_0.50: 0.6379, bathtub_rec_0.50: 0.4839, mAR_0.50: 0.3500
mrsempress commented 11 months ago

And the config part is:

2023-09-18 18:52:09,574 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /mnt/lustre/share/cuda-11.0
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.0
OpenCV: 4.8.0
MMCV: 1.3.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.0
MMDetection: 2.10.0
MMDetection3D: 0.8.0+8684e1f
------------------------------------------------------------

2023-09-18 18:52:09,574 - mmdet - INFO - Distributed training: True
2023-09-18 18:52:10,436 - mmdet - INFO - Config:
model = dict(
    type='nerfdet',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=4),
    neck_3d=dict(
        type='FastIndoorImVoxelNeck',
        in_channels=256,
        out_channels=128,
        n_blocks=[1, 1, 1]),
    bbox_head=dict(
        type='ScanNetImVoxelHeadV2',
        loss_bbox=dict(type='AxisAlignedIoULoss', loss_weight=1.0),
        n_classes=18,
        n_channels=128,
        n_reg_outs=6,
        n_scales=3,
        limit=27,
        centerness_topk=18),
    voxel_size=(0.16, 0.16, 0.2),
    n_voxels=(40, 40, 16),
    aabb=([-2.7, -2.7, -0.78], [3.7, 3.7, 1.78]),
    near_far_range=[0.2, 8.0],
    N_samples=64,
    N_rand=2048,
    nerf_mode='image',
    depth_supervise=True,
    use_nerf_mask=True,
    nerf_sample_view=20,
    squeeze_scale=4,
    nerf_density=True)
train_cfg = dict()
test_cfg = dict(nms_pre=1000, iou_thr=0.25, score_thr=0.01)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
input_modality = dict(
    use_image=True,
    use_depth=True,
    use_lidar=False,
    use_neuralrecon_depth=False,
    use_ray=True)
train_collect_keys = [
    'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth', 'lightpos', 'nerf_sizes',
    'raydirs', 'gt_images', 'gt_depths', 'denorm_images'
]
test_collect_keys = [
    'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
    'gt_depths', 'denorm_images'
]
key = 'denorm_images'
dataset_type = 'ScanNetMultiViewDataset'
data_root = 'data/scannet/'
class_names = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
               'bookshelf', 'picture', 'counter', 'desk', 'curtain',
               'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
               'garbagebin')
train_pipeline = [
    dict(type='LoadAnnotations3D'),
    dict(
        type='MultiViewPipeline',
        n_images=50,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(240, 320))
        ],
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        margin=10,
        depth_range=[0.5, 5.5],
        loading='random',
        nerf_target_views=10),
    dict(type='RandomShiftOrigin', std=(0.7, 0.7, 0.0)),
    dict(
        type='DefaultFormatBundle3D',
        class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
                     'window', 'bookshelf', 'picture', 'counter', 'desk',
                     'curtain', 'refrigerator', 'showercurtrain', 'toilet',
                     'sink', 'bathtub', 'garbagebin')),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth', 'lightpos',
            'nerf_sizes', 'raydirs', 'gt_images', 'gt_depths', 'denorm_images'
        ])
]
test_pipeline = [
    dict(
        type='MultiViewPipeline',
        n_images=101,
        transforms=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(240, 320))
        ],
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        margin=10,
        depth_range=[0.5, 5.5],
        loading='random',
        nerf_target_views=1),
    dict(
        type='DefaultFormatBundle3D',
        class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
                     'window', 'bookshelf', 'picture', 'counter', 'desk',
                     'curtain', 'refrigerator', 'showercurtrain', 'toilet',
                     'sink', 'bathtub', 'garbagebin'),
        with_label=False),
    dict(
        type='Collect3D',
        keys=[
            'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
            'gt_depths', 'denorm_images'
        ])
]
data = dict(
    samples_per_gpu=1,
    workers_per_gpu=1,
    train=dict(
        type='RepeatDataset',
        times=6,
        dataset=dict(
            type='ScanNetMultiViewDataset',
            data_root='data/scannet/',
            ann_file='data/scannet/scannet_infos_train_depth.pkl',
            pipeline=[
                dict(type='LoadAnnotations3D'),
                dict(
                    type='MultiViewPipeline',
                    n_images=50,
                    transforms=[
                        dict(type='LoadImageFromFile'),
                        dict(
                            type='Resize',
                            img_scale=(320, 240),
                            keep_ratio=True),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size=(240, 320))
                    ],
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    margin=10,
                    depth_range=[0.5, 5.5],
                    loading='random',
                    nerf_target_views=10),
                dict(type='RandomShiftOrigin', std=(0.7, 0.7, 0.0)),
                dict(
                    type='DefaultFormatBundle3D',
                    class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
                                 'door', 'window', 'bookshelf', 'picture',
                                 'counter', 'desk', 'curtain', 'refrigerator',
                                 'showercurtrain', 'toilet', 'sink', 'bathtub',
                                 'garbagebin')),
                dict(
                    type='Collect3D',
                    keys=[
                        'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth',
                        'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
                        'gt_depths', 'denorm_images'
                    ])
            ],
            modality=dict(
                use_image=True,
                use_depth=True,
                use_lidar=False,
                use_neuralrecon_depth=False,
                use_ray=True),
            classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
                     'window', 'bookshelf', 'picture', 'counter', 'desk',
                     'curtain', 'refrigerator', 'showercurtrain', 'toilet',
                     'sink', 'bathtub', 'garbagebin'),
            filter_empty_gt=True,
            box_type_3d='Depth')),
    val=dict(
        type='ScanNetMultiViewDataset',
        data_root='data/scannet/',
        ann_file='data/scannet/scannet_infos_val_depth.pkl',
        pipeline=[
            dict(
                type='MultiViewPipeline',
                n_images=101,
                transforms=[
                    dict(type='LoadImageFromFile'),
                    dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size=(240, 320))
                ],
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                margin=10,
                depth_range=[0.5, 5.5],
                loading='random',
                nerf_target_views=1),
            dict(
                type='DefaultFormatBundle3D',
                class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
                             'door', 'window', 'bookshelf', 'picture',
                             'counter', 'desk', 'curtain', 'refrigerator',
                             'showercurtrain', 'toilet', 'sink', 'bathtub',
                             'garbagebin'),
                with_label=False),
            dict(
                type='Collect3D',
                keys=[
                    'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs',
                    'gt_images', 'gt_depths', 'denorm_images'
                ])
        ],
        modality=dict(
            use_image=True,
            use_depth=True,
            use_lidar=False,
            use_neuralrecon_depth=False,
            use_ray=True),
        classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
                 'bookshelf', 'picture', 'counter', 'desk', 'curtain',
                 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
                 'garbagebin'),
        test_mode=True,
        box_type_3d='Depth'),
    test=dict(
        type='ScanNetMultiViewDataset',
        data_root='data/scannet/',
        ann_file='data/scannet/scannet_infos_val_depth.pkl',
        pipeline=[
            dict(
                type='MultiViewPipeline',
                n_images=101,
                transforms=[
                    dict(type='LoadImageFromFile'),
                    dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size=(240, 320))
                ],
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                margin=10,
                depth_range=[0.5, 5.5],
                loading='random',
                nerf_target_views=1),
            dict(
                type='DefaultFormatBundle3D',
                class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
                             'door', 'window', 'bookshelf', 'picture',
                             'counter', 'desk', 'curtain', 'refrigerator',
                             'showercurtrain', 'toilet', 'sink', 'bathtub',
                             'garbagebin'),
                with_label=False),
            dict(
                type='Collect3D',
                keys=[
                    'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs',
                    'gt_images', 'gt_depths', 'denorm_images'
                ])
        ],
        modality=dict(
            use_image=True,
            use_depth=True,
            use_lidar=False,
            use_neuralrecon_depth=False,
            use_ray=True),
        classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
                 'bookshelf', 'picture', 'counter', 'desk', 'curtain',
                 'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
                 'garbagebin'),
        test_mode=True,
        box_type_3d='Depth'))
optimizer = dict(
    type='AdamW',
    lr=0.0002,
    weight_decay=0.0001,
    paramwise_cfg=dict(
        custom_keys=dict(backbone=dict(lr_mult=0.1, decay_mult=1.0))))
optimizer_config = dict(grad_clip=dict(max_norm=35.0, norm_type=2))
lr_config = dict(policy='step', step=[8, 11])
total_epochs = 12
checkpoint_config = dict(interval=1, max_keep_ckpts=-1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
evaluation = dict(interval=1)
dist_params = dict(backend='nccl')
find_unused_parameters = True
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/nerfdet_res50_2x_low_res_depth_sp'
gpu_ids = range(0, 1)

2023-09-18 18:52:10,436 - mmdet - INFO - Set random seed to 0, deterministic: False
2023-09-18 18:52:11,049 - mmdet - INFO - load model from: torchvision://resnet50
2023-09-18 18:52:11,049 - mmdet - INFO - Use load_from_torchvision loader
2023-09-18 18:52:11,351 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2023-09-18 18:52:11,389 - mmdet - INFO - Model:
nerfdet(
  (backbone): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer2): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer3): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (3): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (4): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (5): Bottleneck(
        (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
    (layer4): ResLayer(
      (0): Bottleneck(
        (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
          (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
      (2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
      )
    )
  )
  (neck): FPN(
    (lateral_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (1): ConvModule(
        (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (2): ConvModule(
        (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (3): ConvModule(
        (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (fpn_convs): ModuleList(
      (0): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (1): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (2): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (3): ConvModule(
        (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
  )
  (neck_3d): FastIndoorImVoxelNeck(
    (down_layer_0): Sequential(
      (0): BasicBlock3dV2(
        (conv1): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (norm1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (norm2): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (out_block_0): Sequential(
      (0): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (down_layer_1): Sequential(
      (0): BasicBlock3dV2(
        (conv1): Conv3d(256, 512, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), bias=False)
        (norm1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (norm2): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv3d(256, 512, kernel_size=(1, 1, 1), stride=(2, 2, 2), bias=False)
          (1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
    )
    (up_block_1): Sequential(
      (0): ConvTranspose3d(512, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
      (1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (4): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (out_block_1): Sequential(
      (0): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (down_layer_2): Sequential(
      (0): BasicBlock3dV2(
        (conv1): Conv3d(512, 1024, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), bias=False)
        (norm1): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv3d(1024, 1024, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (norm2): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv3d(512, 1024, kernel_size=(1, 1, 1), stride=(2, 2, 2), bias=False)
          (1): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
    )
    (up_block_2): Sequential(
      (0): ConvTranspose3d(1024, 512, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
      (1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (4): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (out_block_2): Sequential(
      (0): Conv3d(1024, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
      (1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (bbox_head): ScanNetImVoxelHeadV2(
    (loss_centerness): CrossEntropyLoss()
    (loss_bbox): AxisAlignedIoULoss()
    (loss_cls): FocalLoss()
    (centerness_conv): Conv3d(128, 1, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
    (reg_conv): Conv3d(128, 6, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
    (cls_conv): Conv3d(128, 18, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (scales): ModuleList(
      (0): Scale()
      (1): Scale()
      (2): Scale()
    )
  )
  (nerf_mlp): VanillaNeRFRadianceField(
    (posi_encoder): SinusoidalEncoder()
    (view_encoder): SinusoidalEncoder()
    (mlp): NerfMLP(
      (base): MLP(
        (hidden_activation): ReLU()
        (output_activation): Identity()
        (hidden_layers): ModuleList(
          (0): Linear(in_features=133, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): Linear(in_features=256, out_features=256, bias=True)
        )
      )
      (sigma_layer): DenseLayer(
        (hidden_activation): ReLU()
        (output_activation): Identity()
        (hidden_layers): ModuleList()
        (output_layer): Linear(in_features=389, out_features=1, bias=True)
      )
      (bottleneck_layer): DenseLayer(
        (hidden_activation): ReLU()
        (output_activation): Identity()
        (hidden_layers): ModuleList()
        (output_layer): Linear(in_features=389, out_features=256, bias=True)
      )
      (rgb_layer): MLP(
        (hidden_activation): ReLU()
        (output_activation): Identity()
        (hidden_layers): ModuleList(
          (0): Linear(in_features=283, out_features=128, bias=True)
        )
        (output_layer): Linear(in_features=128, out_features=3, bias=True)
      )
    )
  )
  (cov): Sequential(
    (0): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (1): ReLU(inplace=True)
    (2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (3): ReLU(inplace=True)
    (4): Conv3d(256, 1, kernel_size=(1, 1, 1), stride=(1, 1, 1))
  )
  (mean_mapping): Sequential(
    (0): Conv3d(256, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1))
  )
  (cov_mapping): Sequential(
    (0): Conv3d(256, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1))
  )
  (mapping): Sequential(
    (0): Linear(in_features=256, out_features=32, bias=True)
  )
  (mapping_2d): Sequential(
    (0): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
  )
)
chenfengxu714 commented 11 months ago

Got it. I will try to retrain everything from my lab machines recently to double check the consistency. Thanks for pointing out.

mrsempress commented 11 months ago

Thanks. Looking forward your reply.

Yanyirong commented 11 months ago

Hi @mrsempress, I am also trying to train the model but I'm having some difficulties.I wonder if you can share your training logs,I would appreciate a lot.

mrsempress commented 11 months ago

@Yanyirong If possible, could you explain what the difficulties are? Because the training logs are long, could I send them to your email? Please provide your email address, thanks.

Yanyirong commented 11 months ago

Hi @mrsempress,you can find my email address in my profile.

Pixie8888 commented 7 months ago

Hi, did you know how to prepare the dataset? I didn't find any instruction. @mrsempress @chenfengxu714 @Yanyirong

Yanyirong commented 7 months ago

You can refer to this web: https://github.com/SamsungLabs/imvoxelnet/tree/master/data/scannet @Pixie8888

Pixie8888 commented 7 months ago

You can refer to this web: https://github.com/SamsungLabs/imvoxelnet/tree/master/data/scannet @Pixie8888

Thank you!!