Closed GoogleCodeExporter closed 9 years ago
Are you at least able to figure out whether this was an FTPS session?
Original comment by g.rodola
on 30 Nov 2011 at 8:01
The data being written to the fd does not contain any FTP protocol markers.
Therefore my guess is that it is crypted.
Other than that, I don't know how to tell.
Original comment by btimby@gmail.com
on 30 Nov 2011 at 8:03
Please try r923.
This shouldn't fix the EPIPE error but at least the code should no longer end
up being stucked into an infinite loop between calls (_do_ssl_shutdown() ->
handle_error() -> close()).
Obviosuly, I'm just assuming the problem is there (but it's likely).
Original comment by g.rodola
on 30 Nov 2011 at 8:15
Indeed. Broken pipe is not the issue. But trying it over and over is. I will
try this out.
Just your quick opinion, are the changes between r908 and r923 "safe". Would
you suggest simply updating or patching in this change?
Original comment by btimby@gmail.com
on 30 Nov 2011 at 8:33
I would suggest updating.
The changes introduced in that range are mostly bugfixes and a new "feature"
(the faster scheduler), both of which are not supposed to introduce retro
compatibility issues.
Original comment by g.rodola
on 30 Nov 2011 at 8:39
I have updated the servers and restarted them. I will report back in a week if
I don't see this issue again. It happens every couple days and I am not sure
how it is triggered (by a user). So after a week, I would assume this change
resolved it.
Original comment by btimby@gmail.com
on 30 Nov 2011 at 9:02
Even if the infinite loop is fixed please report EPIPE with the complete
traceback.
Original comment by g.rodola
on 30 Nov 2011 at 9:26
With this change in place, I am having errors connecting via FTPES or FTPS
(With FileZilla).
--
Status: Server did not properly shut down TLS connection
Error: Could not read from transfer socket: ECONNABORTED - Connection aborted
--
This is something that happened in older proftpd versions,
here is the fix:
http://proftp.cvs.sourceforge.net/viewvc/proftp/proftpd/contrib/mod_tls.c?r1=1.1
38&r2=1.139
The explanation is:
http://marc.info/?l=proftpd-users&m=121736627908173&w=2
If I roll back to r908, I am able to connect once again.
Original comment by btimby@gmail.com
on 1 Dec 2011 at 3:47
This should be treated as a separate issue.
Can you open one please?
Original comment by g.rodola
on 1 Dec 2011 at 8:35
I committed two revisions to complete the changes made in r923.
r925 - added super() calls to SSLConnection.__init__() and
TLS_FTPHandler.__init__(). This allows the _error attribute to be added when a
TLS_FTPHandler instance is created. Without this, the handle_read() method will
die with an AttributeError.
r926 - added super() call to TLS_DTPHandler as well. For the same reason as
above.
Original comment by btimby@gmail.com
on 2 Dec 2011 at 11:16
I don't know if you want a separate issue for this, but here is another 100%
CPU usage scenario, but with a different strace loop (no broken pipe, but
EAGAIN).
strace:
--
select(9, [3 5 8], [8], [3 5 8], {1, 0}) = 1 (out [8], left {1, 0})
time(NULL) = 1323108657
read(8, 0x4d9b290, 11) = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1323108657, 587397}, NULL) = 0
select(9, [3 5 8], [8], [3 5 8], {1, 0}) = 1 (out [8], left {1, 0})
time(NULL) = 1323108657
read(8, 0x4d9b290, 11) = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1323108657, 587629}, NULL) = 0
select(9, [3 5 8], [8], [3 5 8], {1, 0}) = 1 (out [8], left {1, 0})
time(NULL) = 1323108657
read(8, 0x4d9b290, 11) = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1323108657, 587861}, NULL) = 0
select(9, [3 5 8], [8], [3 5 8], {1, 0}) = 1 (out [8], left {1, 0})
time(NULL) = 1323108657
read(8, 0x4d9b290, 11) = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1323108657, 588092}, NULL) = 0
select(9, [3 5 8], [8], [3 5 8], {1, 0}) = 1 (out [8], left {1, 0})
time(NULL) = 1323108657
read(8, 0x4d9b290, 11) = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1323108657, 588324}, NULL) = 0
--
and lsof shows a connection from an FTP client:
--
python26 1593 smartfile 8u IPv4 59132471 TCP
10.19.78.11:ftp->mustafar:38337 (ESTABLISHED)
--
Mustafar is our proxy/load balancer used for FTP connections.
Original comment by btimby@gmail.com
on 5 Dec 2011 at 6:13
It is possible that r923 fixes this too, but I can't run that due to issue 193.
Original comment by btimby@gmail.com
on 5 Dec 2011 at 6:15
Here is a traceback showing a Broken Pipe error.
Probably one similar to what was causing the death loop (note the past tense).
--
Traceback (most recent call last):
File "/usr/lib64/python2.6/asynchat.py", line 235, in initiate_send
num_sent = self.send(data)
File "/usr/lib/python2.6/site-packages/pyftpdlib/contrib/handlers.py", line 171, in send
return super(SSLConnection, self).send(data)
File "/usr/lib/python2.6/site-packages/pyftpdlib/ftpserver.py", line 1065, in send
result = asyncore.dispatcher.send(self, data)
File "/usr/lib64/python2.6/asyncore.py", line 352, in send
result = self.socket.send(data)
error: [Errno 32] Broken pipe
--
Original comment by btimby@gmail.com
on 7 Dec 2011 at 3:28
Ok, I think I'm going to remove CCC command feature (issue 172).
ssl_shutdown() handling was already complicated and risky.
With CCC it became unmaintainable.
Original comment by g.rodola
on 7 Dec 2011 at 7:23
I removed CCC command implementation as of r934 where I tried to handle all the
SSL-related error conditions.
Please try this new rev one more time and let me know if you experience any
problem.
Original comment by g.rodola
on 7 Dec 2011 at 8:19
I will install this in a couple days. I have restarted the FTP services too
many times this week :-). I will update over the weekend.
I will install on my test environment right away.
Original comment by btimby@gmail.com
on 7 Dec 2011 at 8:22
Original comment by g.rodola
on 7 Dec 2011 at 8:37
I put this into production today. I was seeing hundreds of SSL related errors
(like below).
--
Traceback (most recent call last):
File "/usr/lib64/python2.6/asyncore.py", line 76, in read
obj.handle_read_event()
File "/var/www/smartfile/cli/pyftpd/smartfile.py", line 150, in handle_read_event
super(ProxyProtocolHandler, self).handle_read_event()
File "/usr/lib/python2.6/site-packages/pyftpdlib/contrib/handlers.py", line 146, in handle_read_event
super(SSLConnection, self).handle_read_event()
File "/usr/lib64/python2.6/asyncore.py", line 416, in handle_read_event
self.handle_read()
File "/usr/lib64/python2.6/asynchat.py", line 110, in handle_read
data = self.recv (self.ac_in_buffer_size)
File "/usr/lib/python2.6/site-packages/pyftpdlib/contrib/handlers.py", line 188, in recv
return super(SSLConnection, self).recv(buffer_size)
File "/usr/lib64/python2.6/asyncore.py", line 365, in recv
data = self.socket.recv(buffer_size)
Error: [('SSL routines', 'SSL3_WRITE_PENDING', 'bad write retry')]
--
I will let you know if this latest change (removal of CCC) resolves them.
Original comment by btimby@gmail.com
on 19 Dec 2011 at 10:28
This update eradicated most of the SSL errors I was receiving. There are still
a few, but it is one or two a day rather than hundreds.
I have not seen any server go to 100% CPU usage in about two weeks now. So I
would deem this issue resolved.
Original comment by btimby@gmail.com
on 21 Dec 2011 at 6:17
Good. Please file a separate issue in case SSL3_WRITE_PENDING comes up again.
Original comment by g.rodola
on 22 Dec 2011 at 10:56
Original comment by g.rodola
on 3 Jan 2012 at 11:40
I am still occasionally seeing the same broken pipe error:
gettimeofday({1326146598, 123463}, NULL) = 0
select(20, [3 5 11], [19], [3 5 11 19], {1, 0}) = 1 (in [11], left {1, 0})
write(11, "", 0) = 0
write(11,
"\25\3\1\0\30\334K\217D\n0tYm\333\355\273]e!\230R\207\254\0J\343E\354", 29) =
-1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
gettimeofday({1326146598, 123721}, NULL) = 0
select(20, [3 5 11], [19], [3 5 11 19], {1, 0}) = 1 (in [11], left {1, 0})
write(11, "", 0) = 0
write(11,
"\25\3\1\0\30\334K\217D\n0tYm\333\355\273]e!\230R\207\254\0J\343E\354", 29) =
-1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
And the same lsof output:
python26 1759 user 11u sock 0,5 59896469 can't
identify protocol
Original comment by btimby@gmail.com
on 9 Jan 2012 at 10:06
Does this come from send() method and inherits from socket.error?
Please paste the complete traceback message.
Original comment by g.rodola
on 9 Jan 2012 at 11:05
Also, is it just an exception or you are getting into the endless loop again?
Original comment by g.rodola
on 9 Jan 2012 at 11:16
Original comment by g.rodola
on 9 Jan 2012 at 11:23
I don't have any error reports that occurred before the CPU went to 100%.
Original comment by btimby@gmail.com
on 17 Jan 2012 at 2:30
Ok, it seems we're blind here.
The only thing I can recommend is adding debugging logs to figure out what's
going on.
Something like the diff below.
Can you do that?
Index: pyftpdlib/contrib/handlers.py
===================================================================
--- pyftpdlib/contrib/handlers.py (revisione 967)
+++ pyftpdlib/contrib/handlers.py (copia locale)
@@ -159,17 +159,23 @@
try:
return super(SSLConnection, self).send(data)
except (SSL.WantReadError, SSL.WantWriteError):
+ log("SSL want read/write")
return 0
except SSL.ZeroReturnError:
+ log("SSL zero return")
super(SSLConnection, self).handle_close()
return 0
except SSL.SysCallError, (errnum, errstr):
+ log("SSL sys call error %s" % errstr)
if errstr == 'Unexpected EOF' or errnum == errno.EWOULDBLOCK:
+ log("EOF or EWOULDBLOCK")
return 0
elif errnum in _DISCONNECTED:
+ log("disconnected")
super(SSLConnection, self).handle_close()
return 0
else:
+ log("some other error")
raise
Original comment by g.rodola
on 17 Jan 2012 at 2:47
Oh, hang on. I noticed an error in the code. On send(), we were treating EOF
error as an alias for "retry" rather than "connection lost".
That looks like a good candidate able to create an infinite loop.
Please try r969.
Original comment by g.rodola
on 17 Jan 2012 at 5:43
Ben can I close this one?
Original comment by g.rodola
on 12 May 2012 at 11:44
Yes. I have not had a crash issue in months.
Original comment by btimby@gmail.com
on 14 May 2012 at 1:06
Original comment by g.rodola
on 15 May 2012 at 2:30
Original issue reported on code.google.com by
btimby@gmail.com
on 30 Nov 2011 at 7:58