LinuxCNC / linuxcnc

LinuxCNC controls CNC machines. It can drive milling machines, lathes, 3d printers, laser cutters, plasma cutters, robot arms, hexapods, and more.
http://linuxcnc.org/
GNU General Public License v2.0
1.78k stars 1.14k forks source link

Stack smash with mesa-modbus #2609

Closed pcw-mesa closed 1 year ago

pcw-mesa commented 1 year ago

With the right kind of RX serial data, it's possible to crash LinuxCNC + mesa_modbus with a stack error. While trying to duplicate a forum users error and experimenting with mesa_modbus robustness to various communication faults, I found that by powering down the remote modbus device and then powering it back up I could crash LinuxCNC:

hm2/hm2_7i96s.0: IO Pin 049 (P1-23/DB25-12): IOPort hm2/hm2_7i96s.0: IO Pin 050 (P1-25/DB25-13): IOPort hm2/hm2_7i96s.0: registered note: MAXV max: 30.000 units/sec 1800.000 units/min note: LJOG max: 30.000 units/sec 1800.000 units/min note: LJOG default: 30.000 units/sec 1800.000 units/min note: jog_order='XYZ' note: jog_invert=set() call/response function number mismatch call/response function number mismatch

stack smashing detected : terminated call/response function number mismatch

My guess is that more RX data is present than expected (generated by bogus RX characters caused by marginal serial RX levels during power cycles)

Its possible to work around this by setting the max characters constant larger in the .mod file but it would probably be better to validate the amount of RX data before using it.

petterreinholdtsen commented 1 year ago

[pcw-mesa]

With the right kind of RX serial data, it's possible to crash LinuxCNC

  • mesa_modbus with a stack error. While trying to duplicate a forum users error and experimenting with mesa_modbus robustness to various communication faults, I found that by powering down the remote modbus device and then powering it back up I could crash LinuxCNC:

Are you able to run linuxcnc with valgrind to get more information about the crash? It might pinpoint exactly where in the code the problem is. -- Happy hacking Petter Reinholdtsen

pcw-mesa commented 1 year ago

I can try but since you can work around the problem by increasing the max characters constant, locating of the problem in the source is likely pretty easy.

petterreinholdtsen commented 1 year ago

[pcw-mesa]

I can tr but since you can work around the problem by increasing the max characters constant, locating of the problem in the source is likely pretty easy.

Could be, for someone that know the code well. I do not. But I do know valgrind. :)

-- Happy hacking Petter Reinholdtsn

andypugh commented 1 year ago

Could be, for someone that know the code well. I do not. But I do know valgrind.

How would you run valgrind with LinuxCNC? It needs to be passed the name of an executable, and so I have always fallen at the first hurdle.

Would you expect valgrind halrun -I test.hal to work?

petterreinholdtsen commented 1 year ago

[Andy Pugh]

How would you run valgrind with LinuxCNC? It needs to be passed the name of an executable, and so I have always fallen at the first hurdle.

I would try using --trace-children=yes.

Would you expect valgrind halrun -I test.hal to work?

I guess so. -- Happy hacking Petter Reinholdtsen

andypugh commented 1 year ago

Would you expect valgrind halrun -I test.hal to work?

Actually, it sort-of does. Except for detecting 52,000 errors in standard libraries.

andypugh commented 1 year ago

Hopefully fixed by 1062304c91d050e39e6ff7ecca73095d00118a2c Are you able to test?

pcw-mesa commented 1 year ago

I can tomorrow or Monday

pcw-mesa commented 1 year ago

Yes, seems to be fixed (cycled power ~50 times, got lots of other errors as expected but not longer gets stack errors)