Closed fbs closed 4 years ago
Which version of OpenSSL is present, on the proxy host and on the backend? That is, what does openssl version -a
show? I ask because I'm wondering if the TLS session is using TLSv1.3; if so, there are potential issues with uploads and TLSv1.3 (see Issue #959).
I'm not entirely sure about the backend, it might run a custom proftpd build but the proxy is centos 7 with openssl 1.0.2k.
Got it. I've just put up a PR, which brings the handling of errno
in mod_proxy to be the same as already done in mod_tls
; care to try it out?
With 128MB uploads
Before: total uploads 91, failed: 31 After: total uploads 91, failed: 0
Looks like it's working :).
Thanks for the quick fix!
Fixed in master
. Thanks!
Issue
When tls is enabled between the proxy and the backend the transfer fails after ~2MB.
When tls is disabled between the proxy and the backend we never run into issues.
Proxy error:
config
We've on proftpd 1.3.6 and modproxy 0.5
Troubleshooting
When stracing the child process (the one that handles my client) it Never fails, uploaded the file 20 times in a row without issues. So maybe timing related?
The error self is interesting too
error writing 87380 bytes of data to destination data connection: Success
. The success seems to indicate thaterrno == 0
.As the write fails I took a look at what happens to the writes, as strace made it impossible to reproduce I wrote a bpftrace program to trace it instead:
Which shows that the write fails with -11 and after that the writes to fd 20 stop.
Based on earlier strace I know that port 20 is always the backend fd, the strace output is always the same:
So EAGAIN(-11) somehow leads to failure.
The relevant bit of code for this appears to be https://github.com/Castaglia/proftpd-mod_proxy/blob/master/lib/proxy/tls.c#L610
I wrote another program which fetches the return value from SSL_get_error() just to verify that openssl also detected an issue:
The transfer only fails when
val == 3
,val ==2
happens during a successful transfer too.Assuming that the errno value somehow gets lost in the process I've attached gdb to do some digging:
I found that errno indeed gets lost:
Manually setting
errno=4
when hitting thelib/proxy/tls.c:2025
breakpoint seems to "fix" the issue. If I don't set it the transfer fails, if I do it seems to work, although it hits some write errors a few times.I tried to write a patch for this but I'm unsure how to correctly solve this. I'm still not entirely sure what causes errno to reset. The pr_signals_handle() above might be me being too slow with gdb causing the timer to fire and not the actual cause.