"error in read()" error messages

AlexDavies501 commented 3 years ago

fyi

We were seeing a lot of mbusd errors in journalctl of the form

mbusd[24689]: 28 Oct 2021 03:15:29 tty: error in read() (Resource temporarily unavailable)
mbusd[24689]: 28 Oct 2021 03:15:29 tty: trying to open /dev/ttyS1 (speed 9600 mode 8n1)
mbusd[24689]: 28 Oct 2021 03:15:29 tty: error in read() (Resource temporarily unavailable)
mbusd[24689]: 28 Oct 2021 03:15:29 tty: trying to open /dev/ttyS1 (speed 9600 mode 8n1)
mbusd[24689]: 28 Oct 2021 03:15:59 tty: error in read() (Resource temporarily unavailable)
mbusd[24689]: 28 Oct 2021 03:15:59 tty: trying to open /dev/ttyS1 (speed 9600 mode 8n1)
mbusd[24689]: 28 Oct 2021 03:15:59 tty: error in read() (Resource temporarily unavailable)
mbusd[24689]: 28 Oct 2021 03:15:59 tty: trying to open /dev/ttyS1 (speed 9600 mode 8n1)

After reducing the retries from 3 to zero, they've gone from being every minute, to every few hours.

Looking at the code where the error in read is generated:

623         rc = conn_read(tty.fd, tty.rxbuf + tty.ptrbuf,
624                        tty.rxlen - tty.ptrbuf + tty.rxoffset);
625         if (rc <= 0)
626         { /* error - make attempt to reinitialize serial port */
627 #ifdef LOG
628           logw(0, "tty: error in read() (%s)", rc ? strerror(errno) : "port closed");

and

677         if ((rc = conn_read(tty.fd, tty.rxbuf, BUFSIZE)) <= 0)
678         { /* error - make attempt to reinitialize serial port */
679 #ifdef LOG
680           logw(0, "tty: error in read() (%s)", rc ? strerror(errno) : "port closed");

...they both come from conn_read() returning <= 0.

And conn_read is:

222 ssize_t
223 conn_read(int d, void *buf, size_t nbytes)
224 {
225   int rc;
226   do
227   { /* trying read from descriptor while breaked by signals */
228     rc = read(d, buf, nbytes);
229   } while (rc == -1 && errno == EINTR);
230   return (rc < 0) ? RC_ERR : rc;
231 }

The printed error message = Resource temporarily unavailable = strerror(EAGAIN)

So suspect read() returns -1 and sets errno to EAGAIN. This would be due to no data being available and the descriptor being O_NONBLOCK.

Not sure if this is actually an error in this case that requires the subsequent tty_reinit() call.

Maybe conn_read() could retry the read N times after rc == -1 && errno == EAGAIN ?

Reporting in case this is of interest. Thanks!

AlexDavies501 commented 3 years ago

One more datapoint...

We have also seen those errors with errno set to EFAULT eg.

mbusd[25373]: 30 Oct 2021 13:30:40 tty: error in read() (Bad address)
mbusd[25373]: 30 Oct 2021 17:38:18 tty: error in read() (Bad address)

3cky commented 3 years ago

@AlexDavies501 thanks for the report! Could you provide some additional info about the software and hardware environment you're running? Kernel version, platform, etc.

AlexDavies501 commented 3 years ago

Hi.

Edited uname -a output (removed the computer name) below:

Linux datahub 4.1.15-g33f8ed0f5cb3 #4 SMP PREEMPT Tue May 1 13:54:29 MST 2018 armv7l GNU/Linux

The hardware is a Technologics 7970 embedded computer. Has two serial ports. We run mbusd against one of these.

/usr/local/bin/mbusd -d -v2 -L - -c /usr/local/etc/mbusd/mbusd-ttyS1.conf -p /dev/ttyS1

The config file now looks like:

# Serial port device name
device = /dev/ttyS1

# Serial port speed
speed = 9600

# Serial port mode
mode = 8n1

# RS-485 data direction control type (addc, rts, sysfs_0, sysfs_1)
trx_control = addc

# Sysfs file to use to control data direction
# trx_sysfile =

############# TCP port settings #############

# TCP server address to bind
address = 0.0.0.0

# TCP server port number
port = 12502

# Maximum number of simultaneous TCP connections
maxconn = 32

# Connection timeout value in seconds
timeout = 60

######### Request/response settings #########

# Maximum number of request retries
retries = 0

# Pause between requests in milliseconds
pause = 100

# Response wait time in milliseconds
wait = 500

With the above config i see a dozen tty: error in read() (Resource temporarily unavailable) errors for today, and 3 or 4 for each of the previous 4 days.

Originally, when we saw a lot more of these errors (several hundred per day), the config had retries=3 and wait=1000.

There are 3 regular callers that create Modbus master connections to the TCP side of mbusd. These tend to do a poll of a handful of registers every 5 seconds. Also some occasional scripts manually called from the cmdline that make connections too.

There are two slave devices on the 485 network: a gateway device and a pressure transmitter.

3cky commented 3 years ago

Hi, thanks for the info!

I've pushed some fixes regarding handling EAGAIN read() result. Сould you test mbusd compiled from the latest sources?

AlexDavies501 commented 3 years ago

Many thanks for the quick turn around.

Will try to test a fix next week.

The actual system in question is several thousand miles away... thankfully ssh makes that not too much of a problem!

AlexDavies501 commented 2 years ago

Apologies for the late response - the well the device was on was shut for a several weeks. System turned off.

We've been using the fix on some other sites and have not seen the original error messages there.

So i would vote to close the issue. Many thanks for the fast response and fix!

3cky commented 2 years ago

Glad to hear the problem is sorted out! So I'm closing this issue, please feel free to reopen if needed.

3cky / mbusd

"error in read()" error messages #78