Closed Xxfore closed 3 years ago
As check the log of openpilot , seems modeld was died :
starting process modeld
thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld
<===== Currently modeld is running status
selfdrive/loggerd/loggerd.cc: logging to /data/media/0/realdata/2020-12-02--13-24-39--0
selfdrive/loggerd/loggerd.cc: rotated to /data/media/0/realdata/2020-12-02--13-24-39--0
model: Reader was evicted, reconnecting
cameraOdometry: Reader was evicted, reconnecting
modelV2: Reader was evicted, reconnecting
vin query retry (1) ...
cameraOdometry: Reader was evicted, reconnecting
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
platform[0] CL_PLATFORM_NAME: NVIDIA CUDA
vendor: 'NVIDIA Corporation'
platform version: 'OpenCL 1.2 CUDA 11.0.228'
profile: 'FULL_PROFILE'
extensions: 'cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_sto
re cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_at
omics'
name: 'GeForce MX330'
device version: 'OpenCL 1.2 CUDA'
max work group size: 1024
type = 0x0004 = CL_DEVICE_TYPE_GPU
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
platform[0] CL_PLATFORM_NAME: NVIDIA CUDA
platform[1] CL_PLATFORM_NAME: Intel(R) OpenCL
vendor: 'Intel(R) Corporation'
platform version: 'OpenCL 1.2 LINUX'
profile: 'FULL_PROFILE'
extensions: 'cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_stor
e cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 '
name: 'Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz'
device version: 'OpenCL 1.2 (Build 25)'
ax work group size: 8192
type = 0x0002 = CL_DEVICE_TYPE_CPU
OpenGL version: OpenGL ES 3.2 NVIDIA 450.80.02
OpenGL vendor: NVIDIA Corporation
OpenGL renderer: GeForce MX330/PCIe/SSE2
OpenGL language version: OpenGL ES GLSL ES 3.20
vin query retry (2) ...
resize 1920x1080
vin query retry (3) ...
vin query retry (4) ...
thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld
<===== modeld is died(red color)
Could you help to give some advice? Thanks a lot.
For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (https://github.com/commaai/openpilot/pull/2675).
As i add logs in modeld.cc I found that it seems related opencl. As check /etc/OpenCL/vendors folder , it has following three icd files
nvidia.icd intel-beignet-x86_64-linux-gnu.icd intel.icd
For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (#2675).
Firstly, Thank you so much for your kindness reply, As i change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT. It dont show "Planner solution Error" , But seems the path lane update not immediately.
Additionally , follwing is my PC clinfo result.
Number of platforms 1 Platform Name NVIDIA CUDA Platform Vendor NVIDIA Corporation Platform Version OpenCL 1.2 CUDA 11.0.228 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics Platform Extensions function suffix NV
Platform Name NVIDIA CUDA Number of devices 1 Device Name GeForce MX330
.... NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV] clCreateContext(NULL, ...) [default] Success [NV] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
Is there any solution for above phenomenon?
Thanks a lot.
As i install clinfo in docker, and check :
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
Number of platforms 3
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 11.0.228
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl
_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khrint64
base_atomics cl_khr_int64_extended_atomics
Platform Extensions function suffix NV
Platform Name Intel(R) OpenCL Platform Vendor Intel(R) Corporation Platform Version OpenCL 1.2 LINUX Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended _atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 Platform Extensions function suffix INTEL
Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 2.0 beignet 1.3 Platform Profile FULL_PROFILE ... Platform Name Intel(R) OpenCL Number of devices 1 Device Name Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 1.2 (Build 25) Driver Version 1.2.0.25 Device OpenCL C Version OpenCL C 1.2 Device Type CPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 8 Max clock frequency 1600MHz Device Partition (core) Max number of sub-devices 8 Supported partition types by counts, equally, by names (Intel) Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 8192x8192x8192 Max work group size 8192 Segmentation fault (core dumped)
For me modeld crashes when trying to get an openCL context. Hopefully we can get rid of the intel openCL stuff soon and only rely on the nvidia openCL runtime (#2675).
After I apply changes you mentioned in #2675, it occur two kinds of issue:
Sometime issue , occur Planner solution Error
Video:
https://drive.google.com/file/d/1UXCeaKAf-fFSWz5lHyjGNCnBNDohNTiU/view?usp=sharing
Log:
selfdrive/modeld/modeld.cc: models loaded, modeld starting [48/3663]
selfdrive/modeld/modeld.cc: visionstream connect failed
Assertion failed: ok (src/mailbox.cpp:99)
width 1164, height 874, rgb_stride 3492
vin query retry (5) ...
selfdrive/modeld/modeld.cc: visionstream connect failed
selfdrive/modeld/modeld.cc: visionstream connect failed
selfdrive/modeld/modeld.cc: visionstream connect failed
selfdrive/modeld/modeld.cc: visionstream connect failed
cameraOdometry: Reader was evicted, reconnecting
selfdrive/modeld/modeld.cc: visionstream connect failed
width 816, height 612, rgb_stride 2448
selfdrive/camerad/main.cc: client start fd 155
selfdrive/camerad/main.cc: client start fd 156
selfdrive/modeld/modeld.cc: connected with buffer size: 1526004
_modeld: selfdrive/modeld/runners/onnxmodel.cc:62: void ONNXModel::pwrite(float *, int): Assertion `err >= 0' failed.
thermald logmessaged ui uploader deleter controlsd plannerd loggerd radard calibrationd paramsd camerad proclogd locationd clocksd modeld
vipc_recv err: Connection reset by peer
selfdrive/camerad/main.cc: client end fd 156
Always Issue , path lane update error Video : https://drive.google.com/file/d/1ta-eTRbyL8dRcfrAYU8lo97RNuJjzzdN/view?usp=sharing
Hope your kindness reply. thx.
it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU
it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU
Dear dorkmo,
Thanks a lot for your suggestion.
Could you share the modification about "modeld needs to be moved from the CPU to the GPU"?
Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur ,
Do you think my modification is correct or not?
Thx.
it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU
By the way , Could you invite me into discord channel #openpilot-simulation ? My id is xxfore#6759 .
Thank you so much.
@Xxfore You don't need an invite, you just have to click on the green check mark in #join-development.
heres a link https://discord.com/channels/469524606043160576/728701508820140103 i sent you a friend invite too, hope to see you there
@Xxfore You don't need an invite, you just have to click on the green check mark in #join-development.
Ah, I have joined . Thank you so much for your guide.
heres a link https://discord.com/channels/469524606043160576/728701508820140103 i sent you a friend invite too, hope to see you there
OK, See you there.
it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU
Dear dorkmo,
Thanks a lot for your suggestion. Could you share the modification about "modeld needs to be moved from the CPU to the GPU"? Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur , Do you think my modification is correct or not?
Thx.
Hello,Xxfore! Have you fixed the issue? How? Thanks a lot.
it looks like you are having some of the same issues we are on the discord server in channel #openpilot-simulation . It has been suggested that modeld needs to be moved from the CPU to the GPU
Dear dorkmo,
Thanks a lot for your suggestion. Could you share the modification about "modeld needs to be moved from the CPU to the GPU"? Now I change opencl device type from CL_DEVICE_TYPE_CPU to CL_DEVICE_TYPE_DEFAULT, the issue occur , Do you think my modification is correct or not?
Thx.
Hello,Xxfore! Have you fixed the issue? How? Thanks a lot.
You can get last code and run it on CPU. If you need higher performance by using GPU. you should install onnxruntime-gpu and change specified provider of Onnx InferenceSession to "CUDAExecutionProvider". Thanks a lot.
Describe the bug
Display "Planner Solution Error" When run simulation with last openpilot docker image
How to reproduce or log data
1../start_carla.sh 2./start_openpilot_docker.sh 3.comment out related line in issue https://github.com/commaai/openpilot/issues/2501 4.press "1" in bridge.py terminal
Expected behavior
simulation normally
Additional context
NA
Operating system: Ubuntu 20.04