Closed Chi-Zaozao closed 3 years ago
the logging message provides no useful information.
The full logging message is as follows:
2020-07-23 08:49:02,306 - INFO - Distributed training: False
2020-07-23 08:49:02,306 - INFO - torch.backends.cudnn.benchmark: False
2020-07-23 08:49:02,306 - INFO - building model...
2020-07-23 08:49:02,355 - INFO - Finish RPN Initialization
2020-07-23 08:49:02,355 - INFO - num_classes: [4], num_preds: [56], num_dirs: [16]
2020-07-23 08:49:02,356 - INFO - Finish MultiGroupHead Initialization
2020-07-23 08:49:02,356 - INFO - model already been built
2020-07-23 08:49:02,356 - INFO - building datasets...
2020-07-23 08:49:02,372 - INFO - {'concealed1': 4, 'concealed2': 4, 'concealed3': 4, 'concealed4': 4}
2020-07-23 08:49:02,373 - INFO - [-1]
2020-07-23 08:49:02,379 - INFO - load 428 concealed1 database infos
2020-07-23 08:49:02,379 - INFO - load 117 concealed2 database infos
2020-07-23 08:49:02,379 - INFO - load 450 concealed3 database infos
2020-07-23 08:49:02,379 - INFO - load 208 concealed4 database infos
2020-07-23 08:49:02,382 - INFO - After filter database:
2020-07-23 08:49:02,382 - INFO - load 420 concealed1 database infos
2020-07-23 08:49:02,382 - INFO - load 116 concealed2 database infos
2020-07-23 08:49:02,382 - INFO - load 447 concealed3 database infos
2020-07-23 08:49:02,382 - INFO - load 208 concealed4 database infos
2020-07-23 08:49:02,382 - INFO - datasets already been built
2020-07-23 08:49:02,387 - INFO - starting train detector...
total_steps: 250000
len(data_loaders[0]): 500
2020-07-23 08:49:03,917 - INFO - model structure: PointPillars(
(reader): PillarFeatureNet(
(pfn_layers): ModuleList(
(0): PFNLayer(
(linear): Linear(in_features=9, out_features=64, bias=False)
(norm): BatchNorm1d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
)
)
)
(backbone): PointPillarsScatter()
(neck): RPN(
(blocks): ModuleList(
(0): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(64, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
)
(1): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
(2): Sequential(
(0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
(1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
(2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(6): ReLU()
(7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(9): ReLU()
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(12): ReLU()
(13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(15): ReLU()
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(18): ReLU()
)
)
(deblocks): ModuleList(
(0): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(1): Sequential(
(0): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
(2): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(4, 4), bias=False)
(1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
(2): ReLU()
)
)
)
(bbox_head): MultiGroupHead(
(loss_cls): SigmoidFocalLoss()
(loss_reg): WeightedSmoothL1Loss()
(loss_aux): WeightedSoftmaxClassificationLoss()
(tasks): ModuleList(
(0): Head(
(conv_box): Conv2d(384, 56, kernel_size=(1, 1), stride=(1, 1))
(conv_cls): Conv2d(384, 32, kernel_size=(1, 1), stride=(1, 1))
(conv_dir): Conv2d(384, 16, kernel_size=(1, 1), stride=(1, 1))
)
)
)
)
2020-07-23 08:49:03,918 - INFO - building trainer...
2020-07-23 08:49:03,918 - INFO - trainer already been built
2020-07-23 08:49:03,918 - INFO - trainer registering hooks...
2020-07-23 08:49:03,918 - INFO - hooks already been built
2020-07-23 08:49:03,918 - INFO - start running trainer...
2020-07-23 08:49:03,919 - INFO - Start running, host: root@5ad3f45114dc, work_dir: /workspace/det3d_requirement/Research/deep_km500/outputs/SIMIT_KM500_PILLAR_lucky_lr3e-4_v444_wd1e-3_neg1_20200723-084854
2020-07-23 08:49:03,919 - INFO - workflow: [('train', 1), ('val', 1)], max: 500 epochs
/workspace/det3d_requirement/Det3D/det3d/core/sampler/preprocess.py:464: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float32, 2d, A), array(float32, 2d, C))
points[i : i + 1, :3] = points[i : i + 1, :3] @ rot_mat_T[j]
/opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/typing/npydecl.py:958: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float32, 2d, A), array(float32, 2d, C))
warnings.warn(NumbaPerformanceWarning(msg))
/workspace/det3d_requirement/Det3D/det3d/core/bbox/geometry.py:387: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: Invalid use of Function(<built-in function getitem>) with argument(s) of type(s): (array(float32, 3d, C), Tuple(slice<a:b>, list(int64), slice<a:b>))
* parameterized
In definition 0:
All templates rejected with literals.
In definition 1:
All templates rejected without literals.
In definition 2:
All templates rejected with literals.
In definition 3:
All templates rejected without literals.
In definition 4:
All templates rejected with literals.
In definition 5:
All templates rejected without literals.
In definition 6:
All templates rejected with literals.
In definition 7:
All templates rejected without literals.
In definition 8:
All templates rejected with literals.
In definition 9:
All templates rejected without literals.
In definition 10:
All templates rejected with literals.
In definition 11:
All templates rejected without literals.
In definition 12:
TypeError: unsupported array index type list(int64) in Tuple(slice<a:b>, list(int64), slice<a:b>)
raised from /opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/typing/arraydecl.py:71
In definition 13:
TypeError: unsupported array index type list(int64) in Tuple(slice<a:b>, list(int64), slice<a:b>)
raised from /opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/typing/arraydecl.py:71
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: typing of intrinsic-call at /workspace/det3d_requirement/Det3D/det3d/core/bbox/geometry.py (406)
File "../../../Det3D/det3d/core/bbox/geometry.py", line 406:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
<source elided>
:,
[num_points_of_polygon - 1] + list(range(num_points_of_polygon - 1)),
^
@numba.jit
2020-07-23 08:49:13,659 - INFO - finding looplift candidates
/workspace/det3d_requirement/Det3D/det3d/core/bbox/geometry.py:387: NumbaWarning:
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "points_in_convex_polygon_jit" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>
File "../../../Det3D/det3d/core/bbox/geometry.py", line 423:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
<source elided>
cross = 0.0
for i in range(num_points):
^
@numba.jit
/opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/object_mode_passes.py:178: NumbaWarning: Function "points_in_convex_polygon_jit" was compiled in object mode without forceobj=True, but has lifted loops.
File "../../../Det3D/det3d/core/bbox/geometry.py", line 398:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
<source elided>
# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^
state.func_ir.loc))
/opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/object_mode_passes.py:188: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
File "../../../Det3D/det3d/core/bbox/geometry.py", line 398:
def points_in_convex_polygon_jit(points, polygon, clockwise=True):
<source elided>
# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^
state.func_ir.loc))
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
main()
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', '/workspace/det3d_requirement/Det3D/tools/train.py', '--local_rank=0', '/workspace/det3d_requirement/Research/deep_km500/codes/simit_km500_pillar.py', '--work_dir=/workspace/det3d_requirement/Research/deep_km500/outputs/SIMIT_KM500_PILLAR_lucky_lr3e-4_v444_wd1e-3_neg1_20200723-084854']' died with <Signals.SIGFPE: 8>.
you can replace
/workspace/det3d_requirement/Det3D/det3d/core/sampler/preprocess.py:464: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float32, 2d, A), array(float32, 2d, C)) points[i : i + 1, :3] = points[i : i + 1, :3] @ rot_mat_T[j] /opt/conda/lib/python3.6/site-packages/numba-0.48.0-py3.6-linux-x86_64.egg/numba/typing/npydecl.py:958: NumbaPerformanceWarning: '@' is faster on contiguous arrays, called on (array(float32, 2d, A), array(float32, 2d, C))
with standard matrix multiplication.
I've comment @numba.jit
but it didn't work.
I found error occurs here(det3d/models/readers/pillar_enconder.py line 47):
x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2, 1).contiguous()
And It works well on kitti dataset
It seems error occurs when processing a tensor with 0 element.
Det3D filps the input point cloud randomly, while I didn't set the right range.
When I train pointpillars on my own dataset, I met this problem. I really cannot handle it.
I will be very appreciated if you can give me any suggestion.