CESNET / gridsite

Grid Security for the Web.
Other
7 stars 6 forks source link

gridsite (2.3.4) does not work with the shibboleth (2.6.1-3.2) #38

Open ajulia opened 5 years ago

ajulia commented 5 years ago

After upgrade of the VM running at CERN to CC 7.6 one of the important WLCG services has issues due to failures with gridsite libraries. The gridsite package (gridsite-2.3.4-1.el7.x86_64) fails to work at CC 7.6 , it simply results to "Segmentation fault" errors of apache processes. After investigation the assumption is that the problem caused by the incompatibility of gridsite (2.3.4) and shibboleth (2.6.1-3.2). CERN support team adviced to open a ticket against gridsite. It is an urgent and serious problem since it prevents automatic service deployment and update and requires manual manipulations. We will very much appreciate help in making gridsite working with shibboleth (2.6.1-3.2). Thank you Kind regards Julia Andreeva on behalf of the WLCG Operations

RQF1188492_attachments.zip

sustr4 commented 5 years ago

Dear Julia! We will be happy to help you if we can, but package shibboleth-2.6.1-3.2.el7.cern.x86_64 is only available from your private cern repository. How can we help? We'd appreciate a more detailed description of what broke. Zdeněk Šustr for CESNET & EOSC Support

anisyonk commented 5 years ago

Hi,

actually shibboleth builds, in particular the shibboleth-2.6.1-3.2.el7.cern.x86_64 RPM is available here: http://linuxsoft.cern.ch/cern/centos/7/cern/x86_64/repoview/shibboleth.html

many thanks for the help.

regards, Alexey

valtri commented 5 years ago

As I understand, CC 7.6 is pure CentOS 7.6, only with different centos-release package, which enables additional CERN repository?

We've checked the gridsite alone from EPEL (only with mod_ssl) works fine in CentOS 7.6. So the interesting part will be the combination with the shibboleth.

anisyonk commented 5 years ago

Right, CC7.6 is just CERN CentOS 7.6 image with additional repos enabled by default.

The problem is that the gridsite (2.3.4) does not work together with the shibboleth (2.6.1-3.2) at CentOS 7.6 while each package works well independently.

Apache process simply fails with SIGSEGV error, below you can find an example of stack traces at the moment of failure. I also wrote detailed steps to reproduce the issue.

regards, Alexey

Back trace frames of the apache process failure:

Program received signal SIGSEGV, Segmentation fault.
0x00007fea4bd0ef29 in CRYPTO_get_ex_data () from /lib64/libcrypto.so.10
(gdb) bt
#0  0x00007fea4bd0ef29 in CRYPTO_get_ex_data () from /lib64/libcrypto.so.10
#1  0x00007fea476f2d43 in GRST_callback_SSLVerify_wrapper () from /etc/httpd/modules/mod_gridsite.so
#2  0x00007fea4be02b85 in X509_verify_cert () from /lib64/libcrypto.so.10
#3  0x00007fea4c147be6 in ssl_verify_cert_chain () from /lib64/libssl.so.10
#4  0x00007fea4c11ecc6 in ssl3_get_client_certificate () from /lib64/libssl.so.10
#5  0x00007fea4c1203f8 in ssl3_accept () from /lib64/libssl.so.10
#6  0x00007fea4c12f4f8 in ssl23_accept () from /lib64/libssl.so.10
#7  0x00007fea4c387008 in ssl_io_filter_handshake () from /etc/httpd/modules/mod_ssl.so
#8  0x00007fea4c387d0d in ssl_io_filter_input () from /etc/httpd/modules/mod_ssl.so
#9  0x000055dc8bdcfe47 in ap_rgetline_core ()
#10 0x000055dc8bdd2a64 in ap_read_request ()
#11 0x000055dc8bdf8f1e in ap_process_http_connection ()
#12 0x000055dc8bdf0fc0 in ap_run_process_connection ()
#13 0x00007fea4e41e7af in child_main () from /etc/httpd/modules/mod_mpm_prefork.so
#14 0x00007fea4e41e9f5 in make_child () from /etc/httpd/modules/mod_mpm_prefork.so
#15 0x00007fea4e41ea56 in startup_children () from /etc/httpd/modules/mod_mpm_prefork.so
#16 0x00007fea4e41f760 in prefork_run () from /etc/httpd/modules/mod_mpm_prefork.so
#17 0x000055dc8bdcbffe in ap_run_mpm ()
#18 0x000055dc8bdc4d76 in main ()

Steps to reproduce the issue:

  1. Install fresh CC7 node
    1. ensure that gridsite works without shibboleth
yum install mod_wsgi gridsite httpd
yum install emacs wget

## edit def apache settings to allow default index page listing: comment line "Options -Indexes"
emacs -nw /etc/httpd/conf.d/welcome.conf
# Options -Indexes

# enable SSL Verification: comment set "SSLVerifyClient" to require
emacs -nw /etc/httpd/conf.d/ssl.conf
SSLVerifyClient require

/sbin/service httpd restart
wget --no-check-certificate https://127.0.0.1/ --certificate /etc/pki/tls/certs/localhost.crt --private-key /etc/pki/tls/private/localhost.key
--2018-12-12 23:43:02-- https://127.0.0.1/
Connecting to 127.0.0.1:443... connected.
OpenSSL: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca
Unable to establish SSL connection.

OK. we get expected behavior, request failed due to "unknown ca" => gridsite works well.

  1. Install shibboleth and ensure that gridsite + shibboleth do not work together.

(Install shibboleth using instruction: http://linux.web.cern.ch/linux/centos7/docs/shibboleth.shtml)

yum install shibboleth
emacs -nw /etc/sysconfig/selinux
# set SELINUX=enforcing
/usr/sbin/setenforce Permissive

wget http://linux.web.cern.ch/linux/centos7/docs/shibboleth/shibboleth2.xml -O /etc/shibboleth/shibboleth2.xml
wget http://linux.web.cern.ch/linux/centos7/docs/shibboleth/ADFS-metadata.xml -O /etc/shibboleth/ADFS-metadata.xml
wget http://linux.web.cern.ch/linux/centos7/docs/shibboleth/attribute-map.xml -O /etc/shibboleth/attribute-map.xml
wget http://linux.web.cern.ch/linux/centos7/docs/shibboleth/wsignout.gif -O /etc/shibboleth/wsignout.gif

## skip Shibb configuration with real hostname in /etc/shibboleth/shibboleth2.xml since it does not matter at given point.
yum install opensaml-schemas #### --- (this step is missing in the http://linux.web.cern.ch/linux/centos7/docs/shibboleth.shtml instruction by the way!)
/bin/systemctl start shibd
/bin/systemctl restart httpd

wget --no-check-certificate https://127.0.0.1/ --certificate /etc/pki/tls/certs/localhost.crt --private-key /etc/pki/tls/private/localhost.key
--2018-12-13 00:03:05-- https://127.0.0.1/
Connecting to 127.0.0.1:443... connected.
Unable to establish SSL connection.

OK, we got unexpected Unable to establish SSL connection error: checking apache logs:

tail /var/log/httpd/error_log
..
[Thu Dec 13 00:03:06.147577 2018] [core:notice] [pid 24561] AH00052: child pid 24566 exit signal Segmentation fault (11)

Typical backtrace stack details shown above/below.

  1. ensure that httpd-2.4.6-88.el7.centos.x86_64+ shibboleth-2.6.1-3.2.el7.cern.x86_64 works without gridsite-2.3.4-1.el7.x86_64
yum erase gridsite
# ensure that in /etc/httpd/conf.d/ssl.conf SSLVerifyClient=require, fix if need
/bin/systemctl restart httpd
wget --no-check-certificate https://127.0.0.1/ --certificate /etc/pki/tls/certs/localhost.crt --private-key /etc/pki/tls/private/localhost.key
--2018-12-13 00:49:39-- https://127.0.0.1/
Connecting to 127.0.0.1:443... connected.
OpenSSL: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca
Unable to establish SSL connection.

It works as expected however with an error

  1. If you install gridsite again you get apache Segfault error again.

Manually rebuild gridsite packages at CC7.6 test machine, also installing debug info for more detailed backtrace:

gridsite-2.3.4-1.el7.x86_64

rpm -qa| grep gridsite
gridsite-debuginfo-2.3.4-1.el7.cern.x86_64
gridsite-2.3.4-1.el7.cern.x86_64
gridsite-libs-2.3.4-1.el7.cern.x86_64

Apache process failed with typical error:

Program received signal SIGSEGV, Segmentation fault.
0x00007eff30d4df29 in CRYPTO_get_ex_data () from /usr/lib64/libcrypto.so.10
(gdb) bt
#0  0x00007eff30d4df29 in CRYPTO_get_ex_data () from /usr/lib64/libcrypto.so.10
#1  0x00007eff2e558ce3 in GRST_callback_SSLVerify_wrapper (ok=1, ctx=0x7fffecef8430) at canl_mod_gridsite.c:3489
#2  0x00007eff30e40589 in internal_verify () from /usr/lib64/libcrypto.so.10
#3  0x00007eff30e4248f in X509_verify_cert () from /usr/lib64/libcrypto.so.10
#4  0x00007eff3045fbe6 in ssl_verify_cert_chain () from /lib64/libssl.so.10
#5  0x00007eff30436cc6 in ssl3_get_client_certificate () from /lib64/libssl.so.10
#6  0x00007eff304383f8 in ssl3_accept () from /lib64/libssl.so.10
#7  0x00007eff3069f008 in ssl_io_filter_handshake () from /etc/httpd/modules/mod_ssl.so
#8  0x00007eff3069fd0d in ssl_io_filter_input () from /etc/httpd/modules/mod_ssl.so
#9  0x0000564cf1e75e47 in ap_rgetline_core ()
#10 0x0000564cf1e78a64 in ap_read_request ()
#11 0x0000564cf1e9ef1e in ap_process_http_connection ()
#12 0x0000564cf1e96fc0 in ap_run_process_connection ()
#13 0x00007eff352847af in child_main () from /etc/httpd/modules/mod_mpm_prefork.so
#14 0x00007eff352849f5 in make_child () from /etc/httpd/modules/mod_mpm_prefork.so
#15 0x00007eff35284a56 in startup_children () from /etc/httpd/modules/mod_mpm_prefork.so
#16 0x00007eff35285760 in prefork_run () from /etc/httpd/modules/mod_mpm_prefork.so
#17 0x0000564cf1e71ffe in ap_run_mpm ()
#18 0x0000564cf1e6ad76 in main ()
anisyonk commented 5 years ago

Hi,

any updates and follow up of the issue?

regards, Alexey

anisyonk commented 5 years ago

Hi, any news on the issue? thanks.

ajulia commented 5 years ago

Dear supporters, We really need your help on this issue. CRIC service is becoming the core part of the WLCG Information Infrastructure. CRIC deployment depends on the resolution of this problem. Could you, please, look into it with high priority. Many thanks in advance. Julia

valtri commented 5 years ago

I couldn't reproduce the problem. Have tried both SL 7.6 and CentOS 7.6 (clean OS with added CERN repositories). Package versions from CentOS 7.6:

[root@myriad9 ~]# rpm -q canl-c gridsite gsoap httpd mod_ssl mod_wsgi opensaml-schemas openssl-libs shibboleth
canl-c-2.1.8-1.el7.x86_64
gridsite-2.3.4-1.el7.x86_64
gsoap-2.8.16-12.el7.x86_64
httpd-2.4.6-89.el7.centos.x86_64
mod_ssl-2.4.6-89.el7.centos.x86_64
mod_wsgi-3.4-18.el7.x86_64
opensaml-schemas-2.6.1-3.1.el7.cern.x86_64
openssl-libs-1.0.2k-16.el7_6.1.x86_64
shibboleth-2.6.1-3.3.el7.cern.x86_64

One of the hypotheses so far: it could be about order of loading of the apache modules or something (application context data mismatch in X509_STORE_CTX?).

valtri commented 5 years ago

Could you try explore the crash in gdb? We could get the confirmation, there is a problem with the data structures got from the contexts...

Launching httpd in gdb:

service httpd stop
/usr/sbin/httpd -X&
gdb /usr/sbin/httpd $!
# in gdb:
  b GRST_callback_SSLVerify_wrapper
  c

Launch the wget.

Explore data structures:

# in gdb:
p ctx->ex_data.sk->stack
p ctx->ex_data.sk->stack->data[0]
p *(SSL*)ctx->ex_data.sk->stack->data[0]

p ((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack
p ((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack->data[0]
p *(conn_rec *)((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack->data[0]

The last command should show reasonable data.

My output (shorted):


(gdb) p ctx->ex_data.sk->stack
$25 = {num = 1, data = 0x558990667270, sorted = 0, num_alloc = 4, comp = 0x0}
(gdb) p ctx->ex_data.sk->stack->data[0]
$26 = 0x55899061f3a0 "\003\003"
(gdb) p ((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack
$42 = {num = 2, data = 0x5589904e38d0, sorted = 0, num_alloc = 4, comp = 0x0}
(gdb) p ((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack->data[0]
$43 = 0x558990268070 "\b~&\220\211U"
(gdb) p *(conn_rec *)((SSL*)ctx->ex_data.sk->stack->data[0])->ex_data.sk->stack->data[0]
$44 = {pool = 0x558990267e08, base_server = 0x558990207830, vhost_lookup_data = 0x0, local_addr = 0x558990267ed0, 
  client_addr = 0x558990267f90, client_ip = 0x5589902685b0 "127.0.0.1", remote_host = 0x0, remote_logname = 0x0, 
  local_ip = 0x558990268580 "127.0.0.1", local_host = 0x0, id = 0, conn_config = 0x558990268130, notes = 0x5589902683e0, 
  input_filters = 0x558990650990, output_filters = 0x5589902686d8, sbh = 0x5589902666f8, bucket_alloc = 0x558990331f88, 
  cs = 0x0, data_in_input_filters = 0, data_in_output_filters = 0, clogging_input_filters = 1, double_reverse = 0, 
  aborted = 0, keepalive = AP_CONN_UNKNOWN, keepalives = 0, log = 0x0, log_id = 0x0, current_thread = 0x55899
``