Display "Planner Solution Error" When run simulation with last openpilot docker image

Xxfore commented 3 years ago

Describe the bug

How to reproduce or log data

1../start_carla.sh 2./start_openpilot_docker.sh 3.comment out related line in issue https://github.com/commaai/openpilot/issues/2501 4.press "1" in bridge.py terminal

Expected behavior

simulation normally

Additional context

NA

Operating system: Ubuntu 20.04 2020-12-02 21-16-57屏幕截图

Xxfore commented 3 years ago

As check the log of openpilot , seems modeld was died :

starting process modeld
thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld
<===== Currently modeld is running status selfdrive/loggerd/loggerd.cc: logging to /data/media/0/realdata/2020-12-02--13-24-39--0
selfdrive/loggerd/loggerd.cc: rotated to /data/media/0/realdata/2020-12-02--13-24-39--0
model: Reader was evicted, reconnecting
cameraOdometry: Reader was evicted, reconnecting
modelV2: Reader was evicted, reconnecting
vin query retry (1) ...
cameraOdometry: Reader was evicted, reconnecting
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
platform[0] CL_PLATFORM_NAME: NVIDIA CUDA
vendor: 'NVIDIA Corporation'
platform version: 'OpenCL 1.2 CUDA 11.0.228'
profile: 'FULL_PROFILE'
extensions: 'cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_sto re cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_at omics'
name: 'GeForce MX330'
device version: 'OpenCL 1.2 CUDA'
max work group size: 1024
type = 0x0004 = CL_DEVICE_TYPE_GPU
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
platform[0] CL_PLATFORM_NAME: NVIDIA CUDA
platform[1] CL_PLATFORM_NAME: Intel(R) OpenCL
vendor: 'Intel(R) Corporation'
platform version: 'OpenCL 1.2 LINUX'
profile: 'FULL_PROFILE'
extensions: 'cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_stor e cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 '
name: 'Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz'
device version: 'OpenCL 1.2 (Build 25)'
ax work group size: 8192 type = 0x0002 = CL_DEVICE_TYPE_CPU OpenGL version: OpenGL ES 3.2 NVIDIA 450.80.02 OpenGL vendor: NVIDIA Corporation OpenGL renderer: GeForce MX330/PCIe/SSE2 OpenGL language version: OpenGL ES GLSL ES 3.20 vin query retry (2) ... resize 1920x1080 vin query retry (3) ... vin query retry (4) ... thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld <===== modeld is died(red color)

Could you help to give some advice? Thanks a lot.

pd0wm commented 3 years ago

For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (https://github.com/commaai/openpilot/pull/2675).

Xxfore commented 3 years ago

As i add logs in modeld.cc I found that it seems related opencl. As check /etc/OpenCL/vendors folder , it has following three icd files

nvidia.icd intel-beignet-x86_64-linux-gnu.icd intel.icd

Xxfore commented 3 years ago

For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (#2675).

Firstly, Thank you so much for your kindness reply, As i change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT. It dont show "Planner solution Error" , But seems the path lane update not immediately.

Additionally , follwing is my PC clinfo result.

Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.0.228 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Platform Extensions function suffix NV

Platform Name NVIDIA CUDA Number of devices 1 Device Name GeForce MX330

.... NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV] clCreateContext(NULL, ...) [default] Success [NV] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform

Is there any solution for above phenomenon?

Thanks a lot.

Xxfore commented 3 years ago

As i install clinfo in docker, and check :

beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) Number of platforms 3
Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.0.228 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl _khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khrint64 base_atomics cl_khr_int64_extended_atomics Platform Extensions function suffix NV

Platform Name Intel(R) OpenCL Platform Vendor Intel(R) Corporation Platform Version OpenCL 1.2 LINUX Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended _atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 Platform Extensions function suffix INTEL

Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 2.0 beignet 1.3 Platform Profile FULL_PROFILE ... Platform Name Intel(R) OpenCL Number of devices 1 Device Name Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 1.2 (Build 25) Driver Version 1.2.0.25 Device OpenCL C Version OpenCL C 1.2 Device Type CPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 8 Max clock frequency 1600MHz Device Partition (core) Max number of sub-devices 8 Supported partition types by counts, equally, by names (Intel) Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 8192x8192x8192 Max work group size 8192 Segmentation fault (core dumped)

Xxfore commented 3 years ago

For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (#2675).

After I apply changes you mentioned in #2675, it occur two kinds of issue:

Sometime issue , occur Planner solution Error Video: https://drive.google.com/file/d/1UXCeaKAf-fFSWz5lHyjGNCnBNDohNTiU/view?usp=sharing Log: selfdrive/modeld/modeld.cc: models loaded, modeld starting [48/3663] selfdrive/modeld/modeld.cc: visionstream connect failed Assertion failed: ok (src/mailbox.cpp:99)
width 1164, height 874, rgb_stride 3492
vin query retry (5) ...
selfdrive/modeld/modeld.cc: visionstream connect failed
selfdrive/modeld/modeld.cc: visionstream connect failed
selfdrive/modeld/modeld.cc: visionstream connect failed selfdrive/modeld/modeld.cc: visionstream connect failed cameraOdometry: Reader was evicted, reconnecting selfdrive/modeld/modeld.cc: visionstream connect failed width 816, height 612, rgb_stride 2448
selfdrive/camerad/main.cc: client start fd 155
selfdrive/camerad/main.cc: client start fd 156
selfdrive/modeld/modeld.cc: connected with buffer size: 1526004
_modeld: selfdrive/modeld/runners/onnxmodel.cc:62: void ONNXModel::pwrite(float *, int): Assertion `err >= 0' failed.
thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld vipc_recv err: Connection reset by peer
selfdrive/camerad/main.cc: client end fd 156
Always Issue , path lane update error Video : https://drive.google.com/file/d/1ta-eTRbyL8dRcfrAYU8lo97RNuJjzzdN/view?usp=sharing

Hope your kindness reply. thx.

dorkmo commented 3 years ago

it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU

Xxfore commented 3 years ago

it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU

Dear dorkmo,

Thanks a lot for your suggestion.
Could you share the modification about "modeld needs to be moved from the CPU to the GPU"?
Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur , 
Do you think my modification is correct or not?

Thx.

Xxfore commented 3 years ago

it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU

By the way , Could you invite me into discord channel #openpilot-simulation ? My id is xxfore#6759 .

Thank you so much.

LoganxDev commented 3 years ago

@Xxfore You don't need an invite, you just have to click on the green check mark in #join-development.

dorkmo commented 3 years ago

heres a link https://discord.com/channels/469524606043160576/728701508820140103 i sent you a friend invite too, hope to see you there

Xxfore commented 3 years ago

@Xxfore You don't need an invite, you just have to click on the green check mark in #join-development.

Ah, I have joined . Thank you so much for your guide.

Xxfore commented 3 years ago

heres a link https://discord.com/channels/469524606043160576/728701508820140103 i sent you a friend invite too, hope to see you there

OK, See you there.

MayDGT commented 3 years ago

it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU

Dear dorkmo,
Thanks a lot for your suggestion.
Could you share the modification about "modeld needs to be moved from the CPU to the GPU"?
Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur , 
Do you think my modification is correct or not?
Thx.

Hello,Xxfore! Have you fixed the issue? How? Thanks a lot.

Xxfore commented 3 years ago

it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU

Dear dorkmo,
Thanks a lot for your suggestion.
Could you share the modification about "modeld needs to be moved from the CPU to the GPU"?
Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur , 
Do you think my modification is correct or not?
Thx.
Hello,Xxfore! Have you fixed the issue? How? Thanks a lot.

You can get last code and run it on CPU. If you need higher performance by using GPU. you should install onnxruntime-gpu and change specified provider of Onnx InferenceSession to "CUDAExecutionProvider". Thanks a lot.

commaai / openpilot

Display "Planner Solution Error" When run simulation with last openpilot docker image #2674