感知模块启动自己训练的CENTER_POINT_DETECTION模型，运行出现如下错误

cherishTMYY commented 1 year ago

We appreciate you go through Apollo documentations and search previous issues before creating an new one. If neither of the sources helped you with your issues, please report the issue using the following form. Please note missing info can delay the response time.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04):Ubuntu 18.04
Apollo installed from (source or binary):binary
Apollo version (3.5, 5.0, 5.5, 6.0):8.0
Output of apollo.sh config if on master branch:

Steps to reproduce the issue:

Please use bullet points and include as much details as possible: 修改 lidar_detection_pipeline.pb.txt 中的配置文件内容： vim /apollo/modules/perception/pipeline/config/lidar_detection_pipeline.pb.txt 替换 stage_type 为 CENTER_POINT_DETECTION： stage_type: CENTER_POINT_DETECTION 并将对应阶段的配置文件信息内容进行修改： stage_config: { stage_type: CENTER_POINT_DETECTION enabled: true } 启动模块： mainboard -d /apollo/modules/perception/production/dag/dag_streaming_perception_lidar.dag

Supporting materials (screenshots, command lines, code/script snippets):

I0320 15:48:59.299026 17084 memory_optimize_pass.cc:219] Cluster name : conv2d_67.tmp_1 size: 214272 I0320 15:48:59.299033 17084 memory_optimize_pass.cc:219] Cluster name : relu_30.tmp_0 size: 13713408 I0320 15:48:59.299036 17084 memory_optimize_pass.cc:219] Cluster name : relu_21.tmp_0 size: 13713408 I0320 15:48:59.299038 17084 memory_optimize_pass.cc:219] Cluster name : relu_25.tmp_0 size: 13713408 I0320 15:48:59.299039 17084 memory_optimize_pass.cc:219] Cluster name : relu_24.tmp_0 size: 13713408 I0320 15:48:59.299041 17084 memory_optimize_pass.cc:219] Cluster name : batch_norm_31.tmp_2 size: 13713408 I0320 15:48:59.299041 17084 memory_optimize_pass.cc:219] Cluster name : concat_2.tmp_0 size: 82280448 I0320 15:48:59.299042 17084 memory_optimize_pass.cc:219] Cluster name : relu_26.tmp_0 size: 13713408 I0320 15:48:59.299044 17084 memory_optimize_pass.cc:219] Cluster name : relu_20.tmp_0 size: 27426816 I0320 15:48:59.299046 17084 memory_optimize_pass.cc:219] Cluster name : batch_norm_4.tmp_2 size: 54853632 I0320 15:48:59.299046 17084 memory_optimize_pass.cc:219] Cluster name : relu_29.tmp_0 size: 13713408 I0320 15:48:59.299047 17084 memory_optimize_pass.cc:219] Cluster name : relu_2.tmp_0 size: 54853632 I0320 15:48:59.299049 17084 memory_optimize_pass.cc:219] Cluster name : batch_norm_9.tmp_2 size: 27426816 --- Running analysis [ir_graph_to_program_pass] I0320 15:48:59.317131 17084 analysis_predictor.cc:1318] ======= optimize end ======= I0320 15:48:59.318094 17084 naive_executor.cc:110] --- skip [feed], feed -> data terminate called after throwing an instance of 'phi::enforce::EnforceNotMet' what(): (NotFound) Operator (hard_voxelize) is not registered. [Hint: op_info_ptr should not be null.] (at /apollo/data/Paddle/paddle/fluid/framework/op_info.h:156)

daohu527 commented 1 year ago

what framework you use for training, Can you provide more detailed information?

cherishTMYY commented 1 year ago

what framework you use for training, Can you provide more detailed information?

训练容器信息： paddle:2.4.1-gpu-cuda10.2-cudnn7.6-trt7.0 下载的最新的Paddle3D 的代码

circleyr commented 1 year ago

我也遇到了相同的问题.在Apollo8.0中使用20220926-beta中的center_point模型文件也出现了以下问题.

--- Running analysis [ir_graph_to_program_pass] I0328 18:01:38.070505 19982 analysis_predictor.cc:1318] ======= optimize end ======= I0328 18:01:38.075392 19982 naive_executor.cc:110] --- skip [feed], feed -> data terminate called after throwing an instance of 'phi::enforce::EnforceNotMet' what(): (NotFound) Operator (hard_voxelize) is not registered. [Hint: op_info_ptr should not be null.] (at /apollo/data/Paddle/paddle/fluid/framework/op_info.h:156)

Aborted (core dumped)

cherishTMYY commented 1 year ago

我也遇到了相同的问题.在Apollo8.0中使用20220926-beta中的center_point模型文件也出现了以下问题.

--- Running analysis [ir_graph_to_program_pass] I0328 18:01:38.070505 19982 analysis_predictor.cc:1318] ======= optimize end ======= I0328 18:01:38.075392 19982 naive_executor.cc:110] --- skip [feed], feed -> data terminate called after throwing an instance of 'phi::enforce::EnforceNotMet' what(): (NotFound) Operator (hard_voxelize) is not registered. [Hint: op_info_ptr should not be null.] (at /apollo/data/Paddle/paddle/fluid/framework/op_info.h:156)

Aborted (core dumped)

请问您这边找到解决方法了吗？

kathy-lee commented 1 year ago

I meet the same error with a downloaded CenterPoint model from https://github.com/PaddlePaddle/Paddle3D/tree/develop/docs/models/centerpoint.

Actually when exporting the model from Paddle with running python tools/export.py --config ./myfolder/centerpoint_voxels_0075voxel_nuscenes_10sweep.yml --model ./myfolder/model-centerpoint-3d-voxels.pdparams --save_dir ./myfolder/ --export_for_apollo, logs did show : W0420 10:04:24.836481 888 custom_operator.cc:723] Operator (hard_voxelize) has been registered.

daohu527 commented 1 year ago

we will check and feedback then

wayyeah commented 1 year ago

we will check and feedback then

@daohu527 ，I have also encountered this problem, how can I solve it?

ApolloAuto / apollo