Castaglia / proftpd-mod_proxy

FTP proxy support for ProFTPD
http://www.proftpd.org
38 stars 18 forks source link

Proxy with tls to backend fails to upload #164

Closed fbs closed 4 years ago

fbs commented 4 years ago

Issue

When tls is enabled between the proxy and the backend the transfer fails after ~2MB.

When tls is disabled between the proxy and the backend we never run into issues.

cftp / > put random-8M.bin
random-8M.bin:                    ETA:   0:00    0.03/  8.00 MB  303.40 MB/s   Lost data connection to remote host after 2129920 bytes had been sent: Broken pipe.
random-8M.bin:                                    2.03/  8.00 MB    5.26 MB/s
put random-8M.bin: socket write error.

Proxy error:

2020-07-01 16:44:54,288 mod_proxy/0.5[2316]: error writing 87380 bytes of data to destination data connection: Success
2020-07-01 16:44:54,288 mod_proxy/0.5[2316]: unable to proxy data between frontend/backend, closing data connections

config

We've on proftpd 1.3.6 and modproxy 0.5

MaxInstances 30
UseIPv6 off
UseReverseDNS off
SetEnv TZ :/etc/localtime
SQLBackend sqlite
SQLConnectInfo /var/lib/proftpd/pro.db
SQLLogFile /var/log/proftpd/sql.log
SQLAuthenticate off
LogFormat custom "%{%F %T}t %a %u \"%r\" %s %P"
ExtendedLog /var/log/proftpd/extendedlog AUTH,READ,WRITE custom
TransferLog /var/log/proftpd/xferlog

<IfModule mod_proxy.c>
  ProxyEngine on
  ProxyTLSEngine on
  ProxyLog /var/log/proftpd/proxy.log
  ProxyTables /var/lib/ftp/proxytable
  ProxyRole reverse
  ProxyOptions UseProxyProtocol
  ProxySourceAddress 10.16.11.176
  SQLNamedQuery get-user-servers SELECT "'ftp://' || hostname from backends join users on users.backend_id = backends.id where users.username = '%{0}';"
  ProxyReverseConnectPolicy PerUser
  ProxyReverseServers sql:/get-user-servers
</IfModule>

<IfModule mod_tls.c>
  TLSEngine on
  TLSRequired off
  TLSRenegotiate none
  TLSRSACertificateFile /etc/pki/tls/certs/xxx
  TLSRSACertificateKeyFile /etc/pki/tls/private/xxx
  TLSCipherSuite ALL:!ADH:!DES
  TLSVerifyClient off
  TLSLog /var/log/proftpd/tls.log
</IfModule>
Compile-time Settings:
  Version: 1.3.6e (git)
  Platform: LINUX [Linux 3.10.0-1127.8.2.el7.x86_64 x86_64]
  Built: Wed Jul 1 2020 09:11:30 UTC
  Built With:
    configure  '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/opt/proftpd' '--exec-prefix=/opt/proftpd' '--bindir=/opt/proftpd/bin' '--sbindir=/opt/proftpd/sbin' '--sysconfdir=/etc' '--datadir=/opt/proftpd/share' '--includedir=/opt/proftpd/include' '--libdir=/opt/proftpd/lib64' '--libexecdir=/opt/proftpd/libexec' '--localstatedir=/var' '--sharedstatedir=/var/lib' '--mandir=/opt/proftpd/share/man' '--infodir=/opt/proftpd/share/info' '--libexecdir=/opt/proftpd/libexec/proftpd' '--localstatedir=/run/proftpd' '--disable-strip' '--enable-ctrls' '--enable-dso' '--enable-facl' '--enable-ipv6' '--enable-nls' '--enable-openssl' '--enable-shadow' '--with-lastlog' '--with-libraries=/opt/proftpd/lib64/mysql:/opt/proftpd/lib64' '--with-includes=/opt/proftpd/include/mysql:/opt/proftpd/include' '--with-pkgconfig=lib64/pkgconfig' '--with-shared=mod_auth_pam:mod_ban:mod_ctrls_admin:mod_deflate:mod_dnsbl:mod_dynmasq:mod_exec:mod_facl:mod_load:mod_quotatab:mod_quotatab_file:mod_quotatab_radius:mod_quotatab_sql:mod_radius:mod_ratio:mod_readme:mod_rewrite:mod_shaper:mod_site_misc:mod_snmp:mod_sql:mod_wrap2:mod_wrap2_file:mod_wrap2_redis:mod_wrap2_sql:mod_unique_id:mod_auth_otp:mod_digest:mod_sftp:mod_sftp_pam:mod_sftp_sql:mod_sql_passwd:mod_sql_sqlite:mod_tls:mod_tls_fscache:mod_tls_shmcache:mod_proxy:mod_ifsession' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic' 'LDFLAGS=-Wl,-z,relro ' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic'

  CFLAGS: -g2 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -Wall -fno-omit-frame-pointer -Werror=implicit-function-declaration
  LDFLAGS: -L$(top_srcdir)/lib -Wl,-z,relro  -rdynamic -L/opt/proftpd/lib64 -L/opt/proftpd/lib64/mysql
  LIBS: -lacl  -lssl -lcrypto -lcap -lsupp -lattr -lnsl -lresolv -lresolv -lcrypt -ldl

  Files:
    Configuration File:
      /etc/proftpd.conf
    Pid File:
      /run/proftpd/proftpd.pid
    Scoreboard File:
      /run/proftpd/proftpd.scoreboard
    Header Directory:
      /opt/proftpd/include/proftpd
    Shared Module Directory:
      /opt/proftpd/libexec/proftpd

  Info:
    + Max supported UID: 4294967295
    + Max supported GID: 4294967295

  Features:
    - Autoshadow support
    + Controls support
    + curses support
    - Developer support
    + DSO support
    + IPv6 support
    + Largefile support
    + Lastlog support
    - Memcache support
    + ncursesw support
    + NLS support
    - Redis support
    - Sodium support
    + OpenSSL support (FIPS enabled)
    - PCRE support
    + POSIX ACL support
    + Shadow file support
    + Sendfile support
    + Trace support
    + xattr support

  Tunable Options:
    PR_TUNABLE_BUFFER_SIZE = 1024
    PR_TUNABLE_DEFAULT_RCVBUFSZ = 8192
    PR_TUNABLE_DEFAULT_SNDBUFSZ = 8192
    PR_TUNABLE_ENV_MAX = 2048
    PR_TUNABLE_GLOBBING_MAX_MATCHES = 100000
    PR_TUNABLE_GLOBBING_MAX_RECURSION = 8
    PR_TUNABLE_HASH_TABLE_SIZE = 40
    PR_TUNABLE_LOGIN_MAX = 256
    PR_TUNABLE_NEW_POOL_SIZE = 512
    PR_TUNABLE_PATH_MAX = 4096
    PR_TUNABLE_SCOREBOARD_BUFFER_SIZE = 80
    PR_TUNABLE_SCOREBOARD_SCRUB_TIMER = 30
    PR_TUNABLE_SELECT_TIMEOUT = 30
    PR_TUNABLE_TIMEOUTIDENT = 10
    PR_TUNABLE_TIMEOUTIDLE = 600
    PR_TUNABLE_TIMEOUTLINGER = 10
    PR_TUNABLE_TIMEOUTLOGIN = 300
    PR_TUNABLE_TIMEOUTNOXFER = 300
    PR_TUNABLE_TIMEOUTSTALLED = 3600
    PR_TUNABLE_XFER_SCOREBOARD_UPDATES = 10

Troubleshooting

When stracing the child process (the one that handles my client) it Never fails, uploaded the file 20 times in a row without issues. So maybe timing related?

The error self is interesting too error writing 87380 bytes of data to destination data connection: Success. The success seems to indicate that errno == 0.

As the write fails I took a look at what happens to the writes, as strace made it impossible to reproduce I wrote a bpftrace program to trace it instead:

t:syscalls:sys_enter_write
/pid == $1/ {
  @[tid]=nsecs;
  @v[tid] = (args->fd, args->count);
}

t:syscalls:sys_exit_write
/pid == $1 && @[tid]/ {
  $ret = (int64)args->ret;
  printf("Wrote %llu bytes to fd %d : %lld\n", @v[tid].1, @v[tid].0, $ret);
  delete(@[tid]);
  delete(@v[tid]);
}

Which shows that the write fails with -11 and after that the writes to fd 20 stop.

Wrote 16413 bytes to fd 20 : 16413
Wrote 16413 bytes to fd 20 : 16413
Wrote 16413 bytes to fd 20 : 5046
Wrote 11367 bytes to fd 20 : -11

Based on earlier strace I know that port 20 is always the backend fd, the strace output is always the same:

$ strace -p 2316 -e '%network,dup'
bind(19, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.16.11.176")}, 16) = 0
getsockname(19, {sa_family=AF_INET, sin_port=htons(40105), sin_addr=inet_addr("10.16.11.176")}, [16]) = 0
connect(19, {sa_family=AF_INET, sin_port=htons(61465), sin_addr=inet_addr("10.16.1.185")}, 16) = 0
dup(19)                                 = 20
dup(19)                                 = 21

So EAGAIN(-11) somehow leads to failure.

The relevant bit of code for this appears to be https://github.com/Castaglia/proftpd-mod_proxy/blob/master/lib/proxy/tls.c#L610

I wrote another program which fetches the return value from SSL_get_error() just to verify that openssl also detected an issue:

$ bpftrace -e '
ur:/lib64/libssl.so.1.0.2k:SSL_get_error
/comm == "proftpd"/ {
  printf("Hit by: %d %s, val: %llu\n", pid, comm, (int32)retval);
}'
Hit by: 2316 proftpd, val: 2
Hit by: 2316 proftpd, val: 3

The transfer only fails when val == 3, val ==2 happens during a successful transfer too.

Assuming that the errno value somehow gets lost in the process I've attached gdb to do some digging:

I found that errno indeed gets lost:

Continuing.

Breakpoint 1, 0x00007fd131b745f0 in SSL_get_error () from /lib64/libssl.so.10
(gdb) n
Single stepping until exit from function SSL_get_error,
which has no line number information.
tls_write (notes=<optimized out>, nstrm_type=32, len=<optimized out>, buf=<optimized out>, ssl=0xf20b20) at lib/proxy/tls.c:615
615        switch (err) {
(gdb)
619            errno = EINTR;
(gdb) p err
$6 = 3
(gdb) watch errno
Hardware watchpoint 5: errno
(gdb) c
Continuing.
Hardware watchpoint 5: errno

Old value = 11
New value = 4
tls_write (notes=<optimized out>, nstrm_type=32, len=<optimized out>, buf=<optimized out>, ssl=0xf20b20) at lib/proxy/tls.c:613
613        count = -1;
(gdb) c
Continuing.
Hardware watchpoint 5: errno

Old value = 4
New value = 0
0x0000000000423fa6 in pr_signals_handle () at signals.c:442
442        errno = 0;
(gdb) c
Continuing.

Breakpoint 3, netio_write_cb (nstrm=<optimized out>, buf=<optimized out>, buflen=<optimized out>) at lib/proxy/tls.c:2025
2025        if (bread > 0) {

Manually setting errno=4 when hitting the lib/proxy/tls.c:2025 breakpoint seems to "fix" the issue. If I don't set it the transfer fails, if I do it seems to work, although it hits some write errors a few times.

I tried to write a patch for this but I'm unsure how to correctly solve this. I'm still not entirely sure what causes errno to reset. The pr_signals_handle() above might be me being too slow with gdb causing the timer to fire and not the actual cause.

Castaglia commented 4 years ago

Which version of OpenSSL is present, on the proxy host and on the backend? That is, what does openssl version -a show? I ask because I'm wondering if the TLS session is using TLSv1.3; if so, there are potential issues with uploads and TLSv1.3 (see Issue #959).

fbs commented 4 years ago

I'm not entirely sure about the backend, it might run a custom proftpd build but the proxy is centos 7 with openssl 1.0.2k.

Castaglia commented 4 years ago

Got it. I've just put up a PR, which brings the handling of errno in mod_proxy to be the same as already done in mod_tls; care to try it out?

fbs commented 4 years ago

With 128MB uploads

Before: total uploads 91, failed: 31 After: total uploads 91, failed: 0

Looks like it's working :).

Thanks for the quick fix!

Castaglia commented 4 years ago

Fixed in master. Thanks!