Open mrsempress opened 11 months ago
Do you modify anything? If using the config i provided, it should reproduce the results i list in GitHub or paper. Do you raise resolutions?
I did not modify anything. The resolutions are as follows: The best mAP 0.25 is epoch 11:
2023-09-19 16:23:15,981 - mmdet - INFO - Saving checkpoint at 11 epochs
2023-09-19 16:31:29,307 - mmdet - INFO -
+----------------+---------+---------+---------+---------+
| classes | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 |
+----------------+---------+---------+---------+---------+
| table | 0.5533 | 0.7229 | 0.3770 | 0.5029 |
| sofa | 0.7290 | 0.8351 | 0.4794 | 0.5876 |
| chair | 0.7393 | 0.8158 | 0.4466 | 0.5307 |
| bookshelf | 0.5025 | 0.7532 | 0.1858 | 0.3506 |
| curtain | 0.2499 | 0.5075 | 0.0586 | 0.1791 |
| garbagebin | 0.3039 | 0.5528 | 0.1125 | 0.2434 |
| door | 0.3325 | 0.5782 | 0.0595 | 0.1884 |
| picture | 0.0363 | 0.1486 | 0.0116 | 0.0450 |
| cabinet | 0.3431 | 0.6398 | 0.1251 | 0.2769 |
| window | 0.2225 | 0.4787 | 0.0258 | 0.1241 |
| bed | 0.8010 | 0.8519 | 0.6727 | 0.7284 |
| showercurtrain | 0.3742 | 0.7143 | 0.0469 | 0.1429 |
| desk | 0.7159 | 0.9134 | 0.4474 | 0.6378 |
| counter | 0.4279 | 0.5769 | 0.0469 | 0.1731 |
| refrigerator | 0.5198 | 0.6316 | 0.2508 | 0.3860 |
| sink | 0.5148 | 0.5816 | 0.2507 | 0.3367 |
| bathtub | 0.6672 | 0.7742 | 0.3987 | 0.4516 |
| toilet | 0.8918 | 0.9310 | 0.5667 | 0.6207 |
+----------------+---------+---------+---------+---------+
| Overall | 0.4958 | 0.6671 | 0.2535 | 0.3614 |
+----------------+---------+---------+---------+---------+
2023-09-19 16:31:29,345 - mmdet - INFO - Epoch(val) [11][1802] table_AP_0.25: 0.5533, sofa_AP_0.25: 0.7290, chair_AP_0.25: 0.7393, bookshelf_AP_0.25: 0.5025, curtain_AP_0.25: 0.2499, garbagebin_AP_0.25: 0.3039, door_AP_0.25: 0.3325, picture_AP_0.25: 0.0363, cabinet_AP_0.25: 0.3431, window_AP_0.25: 0.2225, bed_AP_0.25: 0.8010, showercurtrain_AP_0.25: 0.3742, desk_AP_0.25: 0.7159, counter_AP_0.25: 0.4279, refrigerator_AP_0.25: 0.5198, sink_AP_0.25: 0.5148, bathtub_AP_0.25: 0.6672, toilet_AP_0.25: 0.8918, mAP_0.25: 0.4958, table_rec_0.25: 0.7229, sofa_rec_0.25: 0.8351, chair_rec_0.25: 0.8158, bookshelf_rec_0.25: 0.7532, curtain_rec_0.25: 0.5075, garbagebin_rec_0.25: 0.5528, door_rec_0.25: 0.5782, picture_rec_0.25: 0.1486, cabinet_rec_0.25: 0.6398, window_rec_0.25: 0.4787, bed_rec_0.25: 0.8519, showercurtrain_rec_0.25: 0.7143, desk_rec_0.25: 0.9134, counter_rec_0.25: 0.5769, refrigerator_rec_0.25: 0.6316, sink_rec_0.25: 0.5816, bathtub_rec_0.25: 0.7742, toilet_rec_0.25: 0.9310, mAR_0.25: 0.6671, table_AP_0.50: 0.3770, sofa_AP_0.50: 0.4794, chair_AP_0.50: 0.4466, bookshelf_AP_0.50: 0.1858, curtain_AP_0.50: 0.0586, garbagebin_AP_0.50: 0.1125, door_AP_0.50: 0.0595, picture_AP_0.50: 0.0116, cabinet_AP_0.50: 0.1251, window_AP_0.50: 0.0258, bed_AP_0.50: 0.6727, showercurtrain_AP_0.50: 0.0469, desk_AP_0.50: 0.4474, counter_AP_0.50: 0.0469, refrigerator_AP_0.50: 0.2508, sink_AP_0.50: 0.2507, bathtub_AP_0.50: 0.3987, toilet_AP_0.50: 0.5667, mAP_0.50: 0.2535, table_rec_0.50: 0.5029, sofa_rec_0.50: 0.5876, chair_rec_0.50: 0.5307, bookshelf_rec_0.50: 0.3506, curtain_rec_0.50: 0.1791, garbagebin_rec_0.50: 0.2434, door_rec_0.50: 0.1884, picture_rec_0.50: 0.0450, cabinet_rec_0.50: 0.2769, window_rec_0.50: 0.1241, bed_rec_0.50: 0.7284, showercurtrain_rec_0.50: 0.1429, desk_rec_0.50: 0.6378, counter_rec_0.50: 0.1731, refrigerator_rec_0.50: 0.3860, sink_rec_0.50: 0.3367, bathtub_rec_0.50: 0.4516, toilet_rec_0.50: 0.6207, mAR_0.50: 0.3614
And epoch 12 is :
2023-09-19 17:55:36,242 - mmdet - INFO - Saving checkpoint at 12 epochs
2023-09-19 18:03:43,898 - mmdet - INFO -
+----------------+---------+---------+---------+---------+
| classes | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 |
+----------------+---------+---------+---------+---------+
| table | 0.5501 | 0.7114 | 0.3603 | 0.4714 |
| chair | 0.7272 | 0.8070 | 0.4297 | 0.5161 |
| sofa | 0.7351 | 0.8247 | 0.4211 | 0.5361 |
| bookshelf | 0.4756 | 0.7273 | 0.1782 | 0.3377 |
| curtain | 0.2039 | 0.4328 | 0.0366 | 0.1493 |
| garbagebin | 0.3135 | 0.5302 | 0.1044 | 0.2170 |
| door | 0.3273 | 0.5546 | 0.0738 | 0.1906 |
| picture | 0.0253 | 0.1216 | 0.0028 | 0.0315 |
| cabinet | 0.3308 | 0.6022 | 0.1158 | 0.2446 |
| window | 0.2102 | 0.4539 | 0.0117 | 0.1028 |
| bed | 0.8080 | 0.8765 | 0.6820 | 0.7407 |
| showercurtrain | 0.3415 | 0.6786 | 0.0661 | 0.1429 |
| desk | 0.7097 | 0.9213 | 0.4788 | 0.6614 |
| counter | 0.4099 | 0.6346 | 0.0321 | 0.1346 |
| refrigerator | 0.5390 | 0.6667 | 0.2349 | 0.3860 |
| sink | 0.5386 | 0.6224 | 0.2465 | 0.3163 |
| toilet | 0.8885 | 0.9310 | 0.5742 | 0.6379 |
| bathtub | 0.6791 | 0.7419 | 0.4183 | 0.4839 |
+----------------+---------+---------+---------+---------+
| Overall | 0.4896 | 0.6577 | 0.2482 | 0.3500 |
+----------------+---------+---------+---------+---------+
2023-09-19 18:03:43,924 - mmdet - INFO - Epoch(val) [12][1802] table_AP_0.25: 0.5501, chair_AP_0.25: 0.7272, sofa_AP_0.25: 0.7351, bookshelf_AP_0.25: 0.4756, curtain_AP_0.25: 0.2039, garbagebin_AP_0.25: 0.3135, door_AP_0.25: 0.3273, picture_AP_0.25: 0.0253, cabinet_AP_0.25: 0.3308, window_AP_0.25: 0.2102, bed_AP_0.25: 0.8080, showercurtrain_AP_0.25: 0.3415, desk_AP_0.25: 0.7097, counter_AP_0.25: 0.4099, refrigerator_AP_0.25: 0.5390, sink_AP_0.25: 0.5386, toilet_AP_0.25: 0.8885, bathtub_AP_0.25: 0.6791, mAP_0.25: 0.4896, table_rec_0.25: 0.7114, chair_rec_0.25: 0.8070, sofa_rec_0.25: 0.8247, bookshelf_rec_0.25: 0.7273, curtain_rec_0.25: 0.4328, garbagebin_rec_0.25: 0.5302, door_rec_0.25: 0.5546, picture_rec_0.25: 0.1216, cabinet_rec_0.25: 0.6022, window_rec_0.25: 0.4539, bed_rec_0.25: 0.8765, showercurtrain_rec_0.25: 0.6786, desk_rec_0.25: 0.9213, counter_rec_0.25: 0.6346, refrigerator_rec_0.25: 0.6667, sink_rec_0.25: 0.6224, toilet_rec_0.25: 0.9310, bathtub_rec_0.25: 0.7419, mAR_0.25: 0.6577, table_AP_0.50: 0.3603, chair_AP_0.50: 0.4297, sofa_AP_0.50: 0.4211, bookshelf_AP_0.50: 0.1782, curtain_AP_0.50: 0.0366, garbagebin_AP_0.50: 0.1044, door_AP_0.50: 0.0738, picture_AP_0.50: 0.0028, cabinet_AP_0.50: 0.1158, window_AP_0.50: 0.0117, bed_AP_0.50: 0.6820, showercurtrain_AP_0.50: 0.0661, desk_AP_0.50: 0.4788, counter_AP_0.50: 0.0321, refrigerator_AP_0.50: 0.2349, sink_AP_0.50: 0.2465, toilet_AP_0.50: 0.5742, bathtub_AP_0.50: 0.4183, mAP_0.50: 0.2482, table_rec_0.50: 0.4714, chair_rec_0.50: 0.5161, sofa_rec_0.50: 0.5361, bookshelf_rec_0.50: 0.3377, curtain_rec_0.50: 0.1493, garbagebin_rec_0.50: 0.2170, door_rec_0.50: 0.1906, picture_rec_0.50: 0.0315, cabinet_rec_0.50: 0.2446, window_rec_0.50: 0.1028, bed_rec_0.50: 0.7407, showercurtrain_rec_0.50: 0.1429, desk_rec_0.50: 0.6614, counter_rec_0.50: 0.1346, refrigerator_rec_0.50: 0.3860, sink_rec_0.50: 0.3163, toilet_rec_0.50: 0.6379, bathtub_rec_0.50: 0.4839, mAR_0.50: 0.3500
And the config part is:
2023-09-18 18:52:09,574 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA A100-SXM4-80GB
CUDA_HOME: /mnt/lustre/share/cuda-11.0
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (GCC) 5.4.0
PyTorch: 1.7.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.0
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.0.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.8.0
OpenCV: 4.8.0
MMCV: 1.3.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.0
MMDetection: 2.10.0
MMDetection3D: 0.8.0+8684e1f
------------------------------------------------------------
2023-09-18 18:52:09,574 - mmdet - INFO - Distributed training: True
2023-09-18 18:52:10,436 - mmdet - INFO - Config:
model = dict(
type='nerfdet',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=4),
neck_3d=dict(
type='FastIndoorImVoxelNeck',
in_channels=256,
out_channels=128,
n_blocks=[1, 1, 1]),
bbox_head=dict(
type='ScanNetImVoxelHeadV2',
loss_bbox=dict(type='AxisAlignedIoULoss', loss_weight=1.0),
n_classes=18,
n_channels=128,
n_reg_outs=6,
n_scales=3,
limit=27,
centerness_topk=18),
voxel_size=(0.16, 0.16, 0.2),
n_voxels=(40, 40, 16),
aabb=([-2.7, -2.7, -0.78], [3.7, 3.7, 1.78]),
near_far_range=[0.2, 8.0],
N_samples=64,
N_rand=2048,
nerf_mode='image',
depth_supervise=True,
use_nerf_mask=True,
nerf_sample_view=20,
squeeze_scale=4,
nerf_density=True)
train_cfg = dict()
test_cfg = dict(nms_pre=1000, iou_thr=0.25, score_thr=0.01)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
input_modality = dict(
use_image=True,
use_depth=True,
use_lidar=False,
use_neuralrecon_depth=False,
use_ray=True)
train_collect_keys = [
'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth', 'lightpos', 'nerf_sizes',
'raydirs', 'gt_images', 'gt_depths', 'denorm_images'
]
test_collect_keys = [
'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
'gt_depths', 'denorm_images'
]
key = 'denorm_images'
dataset_type = 'ScanNetMultiViewDataset'
data_root = 'data/scannet/'
class_names = ('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
'bookshelf', 'picture', 'counter', 'desk', 'curtain',
'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin')
train_pipeline = [
dict(type='LoadAnnotations3D'),
dict(
type='MultiViewPipeline',
n_images=50,
transforms=[
dict(type='LoadImageFromFile'),
dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=(240, 320))
],
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
margin=10,
depth_range=[0.5, 5.5],
loading='random',
nerf_target_views=10),
dict(type='RandomShiftOrigin', std=(0.7, 0.7, 0.0)),
dict(
type='DefaultFormatBundle3D',
class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
'window', 'bookshelf', 'picture', 'counter', 'desk',
'curtain', 'refrigerator', 'showercurtrain', 'toilet',
'sink', 'bathtub', 'garbagebin')),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth', 'lightpos',
'nerf_sizes', 'raydirs', 'gt_images', 'gt_depths', 'denorm_images'
])
]
test_pipeline = [
dict(
type='MultiViewPipeline',
n_images=101,
transforms=[
dict(type='LoadImageFromFile'),
dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=(240, 320))
],
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
margin=10,
depth_range=[0.5, 5.5],
loading='random',
nerf_target_views=1),
dict(
type='DefaultFormatBundle3D',
class_names=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
'window', 'bookshelf', 'picture', 'counter', 'desk',
'curtain', 'refrigerator', 'showercurtrain', 'toilet',
'sink', 'bathtub', 'garbagebin'),
with_label=False),
dict(
type='Collect3D',
keys=[
'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
'gt_depths', 'denorm_images'
])
]
data = dict(
samples_per_gpu=1,
workers_per_gpu=1,
train=dict(
type='RepeatDataset',
times=6,
dataset=dict(
type='ScanNetMultiViewDataset',
data_root='data/scannet/',
ann_file='data/scannet/scannet_infos_train_depth.pkl',
pipeline=[
dict(type='LoadAnnotations3D'),
dict(
type='MultiViewPipeline',
n_images=50,
transforms=[
dict(type='LoadImageFromFile'),
dict(
type='Resize',
img_scale=(320, 240),
keep_ratio=True),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=(240, 320))
],
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
margin=10,
depth_range=[0.5, 5.5],
loading='random',
nerf_target_views=10),
dict(type='RandomShiftOrigin', std=(0.7, 0.7, 0.0)),
dict(
type='DefaultFormatBundle3D',
class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
'door', 'window', 'bookshelf', 'picture',
'counter', 'desk', 'curtain', 'refrigerator',
'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin')),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes_3d', 'gt_labels_3d', 'depth',
'lightpos', 'nerf_sizes', 'raydirs', 'gt_images',
'gt_depths', 'denorm_images'
])
],
modality=dict(
use_image=True,
use_depth=True,
use_lidar=False,
use_neuralrecon_depth=False,
use_ray=True),
classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door',
'window', 'bookshelf', 'picture', 'counter', 'desk',
'curtain', 'refrigerator', 'showercurtrain', 'toilet',
'sink', 'bathtub', 'garbagebin'),
filter_empty_gt=True,
box_type_3d='Depth')),
val=dict(
type='ScanNetMultiViewDataset',
data_root='data/scannet/',
ann_file='data/scannet/scannet_infos_val_depth.pkl',
pipeline=[
dict(
type='MultiViewPipeline',
n_images=101,
transforms=[
dict(type='LoadImageFromFile'),
dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=(240, 320))
],
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
margin=10,
depth_range=[0.5, 5.5],
loading='random',
nerf_target_views=1),
dict(
type='DefaultFormatBundle3D',
class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
'door', 'window', 'bookshelf', 'picture',
'counter', 'desk', 'curtain', 'refrigerator',
'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin'),
with_label=False),
dict(
type='Collect3D',
keys=[
'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs',
'gt_images', 'gt_depths', 'denorm_images'
])
],
modality=dict(
use_image=True,
use_depth=True,
use_lidar=False,
use_neuralrecon_depth=False,
use_ray=True),
classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
'bookshelf', 'picture', 'counter', 'desk', 'curtain',
'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin'),
test_mode=True,
box_type_3d='Depth'),
test=dict(
type='ScanNetMultiViewDataset',
data_root='data/scannet/',
ann_file='data/scannet/scannet_infos_val_depth.pkl',
pipeline=[
dict(
type='MultiViewPipeline',
n_images=101,
transforms=[
dict(type='LoadImageFromFile'),
dict(type='Resize', img_scale=(320, 240), keep_ratio=True),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size=(240, 320))
],
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
margin=10,
depth_range=[0.5, 5.5],
loading='random',
nerf_target_views=1),
dict(
type='DefaultFormatBundle3D',
class_names=('cabinet', 'bed', 'chair', 'sofa', 'table',
'door', 'window', 'bookshelf', 'picture',
'counter', 'desk', 'curtain', 'refrigerator',
'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin'),
with_label=False),
dict(
type='Collect3D',
keys=[
'img', 'depth', 'lightpos', 'nerf_sizes', 'raydirs',
'gt_images', 'gt_depths', 'denorm_images'
])
],
modality=dict(
use_image=True,
use_depth=True,
use_lidar=False,
use_neuralrecon_depth=False,
use_ray=True),
classes=('cabinet', 'bed', 'chair', 'sofa', 'table', 'door', 'window',
'bookshelf', 'picture', 'counter', 'desk', 'curtain',
'refrigerator', 'showercurtrain', 'toilet', 'sink', 'bathtub',
'garbagebin'),
test_mode=True,
box_type_3d='Depth'))
optimizer = dict(
type='AdamW',
lr=0.0002,
weight_decay=0.0001,
paramwise_cfg=dict(
custom_keys=dict(backbone=dict(lr_mult=0.1, decay_mult=1.0))))
optimizer_config = dict(grad_clip=dict(max_norm=35.0, norm_type=2))
lr_config = dict(policy='step', step=[8, 11])
total_epochs = 12
checkpoint_config = dict(interval=1, max_keep_ckpts=-1)
log_config = dict(
interval=50,
hooks=[dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')])
evaluation = dict(interval=1)
dist_params = dict(backend='nccl')
find_unused_parameters = True
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/nerfdet_res50_2x_low_res_depth_sp'
gpu_ids = range(0, 1)
2023-09-18 18:52:10,436 - mmdet - INFO - Set random seed to 0, deterministic: False
2023-09-18 18:52:11,049 - mmdet - INFO - load model from: torchvision://resnet50
2023-09-18 18:52:11,049 - mmdet - INFO - Use load_from_torchvision loader
2023-09-18 18:52:11,351 - mmdet - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
2023-09-18 18:52:11,389 - mmdet - INFO - Model:
nerfdet(
(backbone): ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): ResLayer(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): ResLayer(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): ResLayer(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): ResLayer(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
)
(neck): FPN(
(lateral_convs): ModuleList(
(0): ConvModule(
(conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
)
(1): ConvModule(
(conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
)
(2): ConvModule(
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
)
(3): ConvModule(
(conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
)
)
(fpn_convs): ModuleList(
(0): ConvModule(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(1): ConvModule(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(2): ConvModule(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(3): ConvModule(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
)
(neck_3d): FastIndoorImVoxelNeck(
(down_layer_0): Sequential(
(0): BasicBlock3dV2(
(conv1): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(norm1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(norm2): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(out_block_0): Sequential(
(0): Conv3d(256, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(down_layer_1): Sequential(
(0): BasicBlock3dV2(
(conv1): Conv3d(256, 512, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), bias=False)
(norm1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(norm2): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv3d(256, 512, kernel_size=(1, 1, 1), stride=(2, 2, 2), bias=False)
(1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(up_block_1): Sequential(
(0): ConvTranspose3d(512, 256, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(4): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
(out_block_1): Sequential(
(0): Conv3d(512, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
(down_layer_2): Sequential(
(0): BasicBlock3dV2(
(conv1): Conv3d(512, 1024, kernel_size=(3, 3, 3), stride=(2, 2, 2), padding=(1, 1, 1), bias=False)
(norm1): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(1024, 1024, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(norm2): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv3d(512, 1024, kernel_size=(1, 1, 1), stride=(2, 2, 2), bias=False)
(1): BatchNorm3d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(up_block_2): Sequential(
(0): ConvTranspose3d(1024, 512, kernel_size=(2, 2, 2), stride=(2, 2, 2), bias=False)
(1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(4): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
(out_block_2): Sequential(
(0): Conv3d(1024, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
)
)
(bbox_head): ScanNetImVoxelHeadV2(
(loss_centerness): CrossEntropyLoss()
(loss_bbox): AxisAlignedIoULoss()
(loss_cls): FocalLoss()
(centerness_conv): Conv3d(128, 1, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(reg_conv): Conv3d(128, 6, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(cls_conv): Conv3d(128, 18, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(scales): ModuleList(
(0): Scale()
(1): Scale()
(2): Scale()
)
)
(nerf_mlp): VanillaNeRFRadianceField(
(posi_encoder): SinusoidalEncoder()
(view_encoder): SinusoidalEncoder()
(mlp): NerfMLP(
(base): MLP(
(hidden_activation): ReLU()
(output_activation): Identity()
(hidden_layers): ModuleList(
(0): Linear(in_features=133, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=256, bias=True)
(3): Linear(in_features=256, out_features=256, bias=True)
)
)
(sigma_layer): DenseLayer(
(hidden_activation): ReLU()
(output_activation): Identity()
(hidden_layers): ModuleList()
(output_layer): Linear(in_features=389, out_features=1, bias=True)
)
(bottleneck_layer): DenseLayer(
(hidden_activation): ReLU()
(output_activation): Identity()
(hidden_layers): ModuleList()
(output_layer): Linear(in_features=389, out_features=256, bias=True)
)
(rgb_layer): MLP(
(hidden_activation): ReLU()
(output_activation): Identity()
(hidden_layers): ModuleList(
(0): Linear(in_features=283, out_features=128, bias=True)
)
(output_layer): Linear(in_features=128, out_features=3, bias=True)
)
)
)
(cov): Sequential(
(0): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(1): ReLU(inplace=True)
(2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
(3): ReLU(inplace=True)
(4): Conv3d(256, 1, kernel_size=(1, 1, 1), stride=(1, 1, 1))
)
(mean_mapping): Sequential(
(0): Conv3d(256, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1))
)
(cov_mapping): Sequential(
(0): Conv3d(256, 32, kernel_size=(1, 1, 1), stride=(1, 1, 1))
)
(mapping): Sequential(
(0): Linear(in_features=256, out_features=32, bias=True)
)
(mapping_2d): Sequential(
(0): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
)
)
Got it. I will try to retrain everything from my lab machines recently to double check the consistency. Thanks for pointing out.
Thanks. Looking forward your reply.
Hi @mrsempress, I am also trying to train the model but I'm having some difficulties.I wonder if you can share your training logs,I would appreciate a lot.
@Yanyirong If possible, could you explain what the difficulties are? Because the training logs are long, could I send them to your email? Please provide your email address, thanks.
Hi @mrsempress,you can find my email address in my profile.
Hi, did you know how to prepare the dataset? I didn't find any instruction. @mrsempress @chenfengxu714 @Yanyirong
You can refer to this web: https://github.com/SamsungLabs/imvoxelnet/tree/master/data/scannet @Pixie8888
You can refer to this web: https://github.com/SamsungLabs/imvoxelnet/tree/master/data/scannet @Pixie8888
Thank you!!
The performance I implemented of nerfdet_res50_2x_low_res is 52.4, but nerfdet_res50_2x_low_res_depth_sp is 49.58. In the paper, the performance of nerfdet_res50_2x_low_res is 52.0, but nerfdet_res50_2x_low_res_depth_sp is 51.8. It seems the depth does not work and even disrupts the performance.