Closed Yneeb closed 4 years ago
I haven't been able to reproduce this exact issue (yet, mainly from a lack of a similar setup, I believe), but I am seeing one weird thing I need to track down, which might be related. Can you post the output of conserver -V
as well? It wouldn't hurt to have the debug output, if possible, also. If you add a -DD
when running conserver and reproduce the problem, that might allow me to see what's happening without a setup on my end (feel free to email bryan@conserver.com as it'll be chatty). We'll see a lot of the lower-level activity, some file descriptor info, etc.
This is the output from conserver -V
:
conserver: conserver.com version 8.2.3
conserver: default access type `r'
conserver: default escape sequence `^Ec'
conserver: default configuration in `/etc/conserver/conserver.cf'
conserver: default password in `/etc/conserver/conserver.passwd'
conserver: default logfile is `/var/log/conserver.log'
conserver: default pidfile is `/run/conserver.pid'
conserver: default limit is 16 members per group
conserver: default socket directory `/run/conserver'
conserver: options: pam, uds
conserver: built with `./configure --prefix=/usr --build=i486-pc-linux-gnu --host=i486-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --docdir=/usr/share/doc/conserver-8.2.3 --htmldir=/usr/share/doc/conserver-8.2.3/html --libdir=/usr/lib --without-dmalloc --without-ipv6 --without-freeipmi --without-gssapi --without-openssl --with-pam --without-libwrap --with-cffile=conserver/conserver.cf --with-logfile=/var/log/conserver.log --with-master=localhost --with-pidfile=/run/conserver.pid --with-port=7782 --with-pwdfile=conserver/conserver.passwd --with-uds=/run/conserver'
This should be all of the output from running conserver -DD
with my example configuration:
One other piece of information that may be useful is that I'm running this on a very slow machine. It just has a 133 Mhz single-core processor. Maybe it's some sort of race condition issue that doesn't show up on faster machines? I'll see if I can replicate this on some faster hardware.
One other piece of information that may be useful is that I'm running this on a very slow machine. It just has a 133 Mhz single-core processor. Maybe it's some sort of race condition issue that doesn't show up on faster machines? I'll see if I can replicate this on some faster hardware.
It seems that my theory was wrong. I gave this a try on a four-core 3.4 Ghz machine and saw the exact same issue. Let me know if there's anything else you'd like me to try.
Thanks for the info - it helped. I was also able to replicate the issue and dug out the missing piece (process exiting and causing read to return something that wasn't being caught). A quick fix for the situation:
diff --git a/conserver/cutil.c b/conserver/cutil.c
index 24ce826..4a4defe 100644
--- a/conserver/cutil.c
+++ b/conserver/cutil.c
@@ -890,7 +890,7 @@ FileRead(CONSFILE *cfp, void *buf, int len)
case simpleSocket:
while (retval < 0) {
if ((retval = read(cfp->fd, buf, len)) <= 0) {
- if (retval == 0) {
+ if (retval == 0 || errno == EIO) {
retval = -1;
break;
}
I'm not 100% sure that's the right solution, but it will do the job for now. I might come up with a more creative long-term fix (or not).
So, yeah, after more thought, the above will do. It's basically just preventing the error message - there's no functional change.
I gave 02d0c15a77210f8636f51168444d4003148bea54 a try on my setup, and it appears to be working correctly. Thanks!
A basic configuration with a simple task will cause the server to emit the following error message about 50-80% of the time the task is run:
I'm using the 8.2.3 version distributed by Gentoo. I don't think there's anything in 8.2.4 that would fix this, but I can give it a try if necessary. I can reliably reproduce this issue by running the task a few times with the following configuration:
The entire output from the server:
I believe that
fd 7
refers to/dev/ptmx
in this situation, but I'm not sure how to debug this issue past that. Let me know if I can provide any more information.