SICKAG / sick_scan_xd

Based on the sick_scan drivers for ROS1, sick_scan_xd merges sick_scan, sick_scan2 and sick_scan_base repositories. The driver supports both Linux (native, ROS1, ROS2) and Windows (native and ROS2).
Apache License 2.0
100 stars 84 forks source link

Segmentation fault (core dumped) Error #376

Open jashangills opened 1 month ago

jashangills commented 1 month ago

Hi ,

We have developed a Python script to use the Sick Scan API with the Lidar LMS4xxx. All API methods seem to be working fine except when running the following method:

def query_device_state(self):
    """
    This method queries the device state.
    """
    try:
        sopas_response = ss.SickScanApiSendSOPAS(
            self.sick_scan_library, self.api_handle, "sRN SCdevicestate"
        )
        self.logger.info(
            'sick_scan_xd_api_test.py: sopas_request="sRN SCdevicestate", sopas_response="%s"',
            sopas_response,
        )
        return sopas_response
    except Exception as e:
        self.logger.info("Failed to query device state: %s", e)

Additionally, sending any other SOPAS command such as:

sopas_response = ss.SickScanApiSendSOPAS(
    self.sick_scan_library, self.api_handle, "sMN LMCstandby"
)

results in a segmentation fault on Machine 1.

Machine 1 Specs:

Machine 2 Specs:

On Machine 2, there is no segmentation fault, and everything works as expected.

Steps Taken:

Issue:

Could you please guide us on where to start looking to resolve this issue?

Thank you in advance for your assistance.

rostest commented 1 month ago

Thank you for reporting this error. Unfortunately, we cannot reproduce the error on a similar system (NVIDIA Jetson Orin Nano, 6 cores, ARMv8, Linux, 64-bit). Can you compile with debug information (-g) and post the core dump for further analysis? Did you get any error messages or warnings from the sick_scan_xd library during the run?

To narrow down possible errors, use the python script https://github.com/SICKAG/sick_scan_xd/blob/develop/test/python/sick_scan_xd_api/sick_scan_xd_api_test.py and run python3 sick_scan_xd_api_test.py ./launch/sick_lms_4xxx.launch hostname:=<lms4xxx_ip_address>. Adjust the paths to sick_scan_xd_api_test.py and sick_lms_4xxx.launch if necessary.

Alternatively or in addition, you can start the python script in gdb to examine the backtrace when running into the segmentation fault. Compile the sick_scan_xd library with debug information (compile flag -g), run gdb python3 on the NVIDIA Jetson Xavier and enter run <script.py> at the gdb prompt. If the error occurs, you can examine the backtrace with bt and up or view variables with print <variable_name>. Build without optimizations (compile flag -O0) if the critical part is optimized out in gdb. Please post a screenshot or logfile if you can observe the error in gdb.

spark-res commented 1 month ago

Hi @rostest,

Confirming the segmentation fault does occur with the _sick_scan_xd_apitest.py script.

See screenshot below:

image

There's a few ?? in there. I'm not sure if I'm compiling correctly for debug. Could you please elaborate on when the -g and -O0 flags should be used? Is that an arg to cmake ?

rostest commented 1 month ago

Thank you very much for following up. Variable "sopas_command" in function SickScanApiSendSOPAS of the gdb backtrace looks corrupted. It should be a pointer to the SOPAS command string "sRN SCdevicestate" created by ctypes.create_string_buffer(str.encode(sopas_command)) in sick_scan_api.py:1240. sopas_command=0xffffffffdf48 "\036" is not expected.

This might be caused by a previous stack corruption (which is unlikely since response_buffer_size=1024 is still correct), by some memory allocation failure of ctypes.create_string_buffer, or eventually by a garbage collector freeing memory to early due to an optimization issue.

To avoid possible memory allocation failures, please replace python/api/sick_scan_api.py by the attached modified file sick_scan_api.zip and retry. It just buffers the memory allocated once and reuses the memory for the sopas command strings.

image

spark-res commented 4 weeks ago

Hi @rostest,

Thanks for the response and next steps.

Unfortunately experiencing the same result with the updated _sick_scanapi script.

seg_fault_new_script

We are going to try it out on another platform which is identical hardware wise to see if the issue persists there. Short of that, open to suggestions on next troubleshooting steps.

rostest commented 4 weeks ago

Thank you for your reply. To avoid potential optimization issues, please build the sick_scan_xd library with compiler flags -g -O0 (debug information, not optimized). These flags can be set in file CMakeLists.txt:54: add_compile_options(-g -O0) # debug flags, for releases: add_compile_options(-O3)

Make sure the library is complete rebuild after modifying CMakeLists.txt, remove the build folder by rm -rf ./build.

spark-res commented 4 weeks ago

Thanks @rostest,

A small win I think. Rebuilt with the -g -O0 flag, and the SOPAS commands now seem to function correctly.

It's late in the day here, I'll do longer testing with sick_scan_xd_api_test.py tomorrow to check for TcpRecvThread Issue (https://github.com/SICKAG/sick_scan_xd/issues/361). This one is much less frequent.

Thankyou for all the help so far!

spark-res commented 3 weeks ago

@rostest Thanks for your help with this. Haven't seen a segmentation fault since rebuilding with the above flags, I think we can close this issue.

rostest commented 3 weeks ago

Thank you very much! We will make these flags configurable by an additional cmake option. If you have the time, please try optimization levels -01 and -02 to generate faster code.