cminyard / ser2net

Serial to network interface, allows TCP/UDP to serial port connections
GNU General Public License v2.0
391 stars 72 forks source link

Segmentation fault when trying to write to serial port #48

Closed brucemiranda closed 3 years ago

brucemiranda commented 3 years ago

Had ser2net v2.x loaded on Ubuntu 18.04 and it was working fine with the following ser2net.conf file line. 5001:raw:0:/dev/ttyUSB0:115200

Upgraded to the latest version of ser2net v4.3 and now I am receiving a segmentation fault with the same configuration file. However reading the port works absolutely fine. It's only when I try to write to the port that it fails.

admin@ubuntu1804:/etc/ser2net$ ser2net -c ser2net.conf -p 5000 -d
ser2net:WARNING: Using old config file format, this will go away
soon.  Please switch to the yaml-based format.
Segmentation fault

I thought it could be something about the old conf file so I upgraded to the yaml style.

So I've made on with these lines

connection: &con00
  accepter: tcp,5001
  timeout: 0
  connector: serialdev,/dev/ttyUSB-HGI80,115200n81,local

However it's the same issue.

admin@ubuntu1804:/etc/ser2net$ ser2net -c ser2net.yaml -p 5000 -d
ser2net[7434]: Admin port already configured on line 303 column 0
Admin port already configured on line 303 column 0
Segmentation fault

I can see the write message coming in on the admin interface monitor, but the moment I see the line appear on the monitor, I see the Segmentation fault on the other window when ser2net is currently on the command line.

What might be going on?

brucemiranda commented 3 years ago

This turned out to be a Serial device/OS issue. Rather than ser2net. Direct serial device write access wasn't working either. I had to unplug and power cycle the serial device for local read and write to work. Once that was confirmed, ser2net started working.

brucemiranda commented 3 years ago

Some more testing later and this is definitely something related to ser2net v4.3.3. I am now using the max-connection option and the moment I open another session to the same ser2net port, the writing to the serial port stops. The first connection is doing reads and writes, the second connection is simply doing read. But even then it seems to cause an issue with the serial device that isn't resolved until I power cycle the serial device.

brucemiranda commented 3 years ago

More information. The first time I power cycle the USB serial device, read and write works fine. If I stop ser2net and restart it again, it seems to lock up the writing to the serial device. Once that happens, even direct access to the serial port does not allow writing. Reading works fine. I have also tried power cycling the USB serial device, starting ser2net, everything work. But if I open another admin connection monitor session, again ser2net will lock the writes. And then that doesn't get resolved until another power cycle of the actual device.

cminyard commented 3 years ago

On Mon, Mar 15, 2021 at 10:45:45AM -0700, brucemiranda wrote:

More information. The first time I power cycle the USB serial device, read and write works fine. If I stop ser2net and restart it again, it seems to lock up the writing to the serial device. Once that happens, even direct access to the serial port does not allow writing. Reading works fine. I have also tried power cycling the USB serial device, starting ser2net, everything work. But if I open another admin connection monitor session, again ser2net will lock the writes. And then that doesn't get resolved until another power cycle of the actual device.

Try adding ,local to the connector line. Often modem control lines can cause issues, but you should have had the same issues with older versions of ser2net. Well, older versions didn't restore the serial port parameters to their previous values on exist and newer versions do, so maybe that's it?

It looks like some change in ser2net behavior is triggering some bug in the hardware or the driver. Power-cycling the hardware and everything working ok means it's probably not an issue in ser2net itself. But the fact that the change from 2.2 caused this means something ser2net is doing is triggering it.

If the ,local thing doesn't work, contact me again.

-corey

brucemiranda commented 3 years ago

If you look at my yaml file, I already have and always had local already stated in the connector parameters connector: serialdev,/dev/ttyUSB-HGI80,115200n81,local

cminyard commented 3 years ago

On Mon, Mar 15, 2021 at 12:15:01PM -0700, brucemiranda wrote:

If you look at my yaml file, I already have and always had local already stated in the connector parameters connector: serialdev,/dev/ttyUSB-HGI80,115200n81,local

Ah, I missed that, yes. I was just looking at your first config line, I guess.

The HGI80 appears to be a pretty specialized device, so it doesn't look like you could replace it with another serial device, even for testing. Nobody else has reported this issue, and there are plenty of users. So It's either:

The HGI80 is doing something that causes ser2net to get messed up.

ser2net is doing something that causes the HGI80 to get messed up.

Just to be sure, if you replace ser2net with version 2.x (I think you mention that was the old version) or perhaps the end of the 3.5 branch, leaving everything else the same, it doesn't have an issue, right?

I'm not coming up with an easy way to debug this. My suspicion is that ser2net is doing something with the modem lines at shutdown that is confusing the device. But that's hard to tell.

-corey

cminyard commented 3 years ago

I forgot, are you still seeing a core dump, or is it a hang now? I think it's a hang now, but I wanted to be sure.

On Mon, Mar 15, 2021 at 07:21:10PM -0500, Corey Minyard wrote:

On Mon, Mar 15, 2021 at 12:15:01PM -0700, brucemiranda wrote:

If you look at my yaml file, I already have and always had local already stated in the connector parameters connector: serialdev,/dev/ttyUSB-HGI80,115200n81,local

Ah, I missed that, yes. I was just looking at your first config line, I guess.

The HGI80 appears to be a pretty specialized device, so it doesn't look like you could replace it with another serial device, even for testing. Nobody else has reported this issue, and there are plenty of users. So It's either:

The HGI80 is doing something that causes ser2net to get messed up.

ser2net is doing something that causes the HGI80 to get messed up.

Just to be sure, if you replace ser2net with version 2.x (I think you mention that was the old version) or perhaps the end of the 3.5 branch, leaving everything else the same, it doesn't have an issue, right?

I'm not coming up with an easy way to debug this. My suspicion is that ser2net is doing something with the modem lines at shutdown that is confusing the device. But that's hard to tell.

-corey

brucemiranda commented 3 years ago

I just get Segmentation Fault. How come it works with ser2net v2.x package that comes with Ubuntu. My other devices will not have the same USB to serial driver as the HGI80.

brucemiranda commented 3 years ago

The bizarre thing is that even if I do not get a Segmentation Fault, writing to the serial device stops after ser2net has been started and then stopped. Port Reading continues fine. This behaviour is seen even with direct port access rather than via ser2net. But it only happens if I try and connect to the port using ser2net after a power cycle. So on exit, ser2net is leaving the port in a state where nothing can write to that port again, regardless of ser2net running or not.

cminyard commented 3 years ago

On Tue, Mar 16, 2021 at 01:05:04AM -0700, brucemiranda wrote:

The bizarre thing is that even if I do not get a Segmentation Fault, writing to the serial device stops after ser2net has been started and then stopped. Port Reading continues fine. This behaviour is seen even with direct port access rather than via ser2net. But it only happens if I try and connect to the port using ser2net after a power cycle. So on exit, ser2net is leaving the port in a state where nothing can write to that port again, regardless of ser2net running or not.

Yeah, this is pretty bizarre. Is it possible to get a backtrace fro the segfault?

-corey

brucemiranda commented 3 years ago

How do I do that? I am quite happy to provide any dump or logs.

cminyard commented 3 years ago

On Tue, Mar 16, 2021 at 11:50:18AM -0700, brucemiranda wrote:

How do I do that? I am quite happy to provide any dump or logs.

It's a little complicated, but not too bad:

Make sure the development libraries for gensio and ser2net are installed. Unless you are compiling them yourself, then, rebuild everything with "make CFLAGS=-g" and re-install

Actually, I never asked if you were compiling it yourself or installing it from the distro.

Run: gdb --args ser2net

When you get the gdb prompt, enter the "run" command.

When it crashes, it should stop immediately and you should get the gdb prompt again. At that point, enter the "bt" command.

Grab the output, then use "exit" to leave.

brucemiranda commented 3 years ago

I have compiled this myself. But without making any changes whatsoever. Just literally pulled the gits and compiled it. So now will try what you suggested.

cminyard commented 3 years ago

On Tue, Mar 16, 2021 at 12:18:06PM -0700, brucemiranda wrote:

I have compiled this myself. But without making any changes whatsoever. Just literally pulled the gits and compiled it. So now will try what you suggested.

Oh, if you are on the master branch of gensio. there may be issues. In fact I just pushed a change that might fix it. If you are using the 4.3.3 tag it should be ok.

-corey

brucemiranda commented 3 years ago

Unless there is a distro for Ubuntu 18.04.5 LTS somewhere I can try?

brucemiranda commented 3 years ago

Pulled the latest versions of gensio and recompiled it. ser2net was already up to date. Did what you said. Now if I start ser2net on one window and try to access the port from another window I get the following.

Program received signal SIGUSR1, User defined signal 1.
__pthread_kill (threadid=<optimised out>, signo=10) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
57      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.

And the backtrace gives me

#0  __pthread_kill (threadid=<optimised out>, signo=10) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x00007ffff7997630 in i_wake_sel_thread () from /usr/local/lib/libgensio.so.0
#2  0x00007ffff79976c6 in wake_timer_sel_thread () from /usr/local/lib/libgensio.so.0
#3  0x00007ffff799897e in sel_start_timer () from /usr/local/lib/libgensio.so.0
#4  0x00007ffff799b1da in gensio_unix_start_timer_abs () from /usr/local/lib/libgensio.so.0
#5  0x000055555555cdd5 in handle_dev_read (buflen=<optimised out>, buf=<optimised out>, err=<optimised out>, port=0x55555578fd60) at dataxfer.c:344
#6  handle_dev_event (io=<optimised out>, user_data=0x55555578fd60, event=<optimised out>, err=<optimised out>, buf=<optimised out>, buflen=0x7fffffffdee0,
    auxdata=0x0) at dataxfer.c:377
#7  0x00007ffff7958b7c in gensio_cb () from /usr/local/lib/libgensio.so.0
#8  0x00007ffff794f4b4 in basen_read_data_handler () from /usr/local/lib/libgensio.so.0
#9  0x00007ffff794ec5b in filter_ll_write () from /usr/local/lib/libgensio.so.0
#10 0x00007ffff7950fb0 in basen_ll_read () from /usr/local/lib/libgensio.so.0
#11 0x00007ffff795140c in gensio_ll_base_cb () from /usr/local/lib/libgensio.so.0
#12 0x00007ffff7952cd2 in gensio_fd_ll_callback () from /usr/local/lib/libgensio.so.0
#13 0x00007ffff7952ddf in fd_deliver_read_data () from /usr/local/lib/libgensio.so.0
#14 0x00007ffff795342e in fd_handle_incoming () from /usr/local/lib/libgensio.so.0
#15 0x00007ffff79535ef in gensio_fd_ll_handle_incoming () from /usr/local/lib/libgensio.so.0
#16 0x00007ffff798bdf4 in sterm_read_ready () from /usr/local/lib/libgensio.so.0
#17 0x00007ffff7953681 in fd_read_ready () from /usr/local/lib/libgensio.so.0
#18 0x00007ffff799abe8 in iod_read_handler () from /usr/local/lib/libgensio.so.0
#19 0x00007ffff7999141 in handle_selector_call () from /usr/local/lib/libgensio.so.0
#20 0x00007ffff79998bb in process_fds_epoll () from /usr/local/lib/libgensio.so.0
#21 0x00007ffff7999beb in sel_select_intr_sigmask () from /usr/local/lib/libgensio.so.0
#22 0x00007ffff7999d80 in sel_select () from /usr/local/lib/libgensio.so.0
#23 0x0000555555561dbf in op_loop (dummy=<optimised out>) at ser2net.c:460
#24 0x00005555555596d3 in main (argc=<optimised out>, argv=<optimised out>) at ser2net.c:1024
cminyard commented 3 years ago

On Tue, Mar 16, 2021 at 01:03:50PM -0700, brucemiranda wrote:

Unless there is a distro for Ubuntu 18.04.5 LTS somewhere I can try?

Trying it from a checkout is the best, you get more accurate debug information if you turn off optimizations.

brucemiranda commented 3 years ago

OK just realised that I hadn't recompiled gensio after the git pull. Did that now and things seem better. i.e. Read and Write is working for the moment.

brucemiranda commented 3 years ago

I am going to close this issue. I think the fix you made to gensio has fixed the issue. Thank you for this fine product and make it freely available.