flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

support gssapi/krb5 security #758

Open garlick opened 8 years ago

garlick commented 8 years ago

zeromq supports GSSAPI/KRB5 as noted with references in RFC 12.

Curve is the only security mechanism currently enabled in Flux. We should enable kerberos as an option for securing Flux overlay networks.

Also worth noting: CEA has deployed auks (see also slides by @hautreux) which is a scheme for caching TGT's on behalf of users submitting batch jobs, and using them at time of execution to obtain session tickets. The scheme is scalable, avoiding the problem of many endpoints simultaneously banging on a single or small number of KDC's

Flux currently loads long-term Curve keys out of the user's home directory which is expected to be shared across the instance. This presumes that network file systems serving home directories are secure from eavesdropping, which in some environments may be challenging. In the Kerberos case, I think auks may be able to scalably and securely distribute keys to flux brokers and avoid this problem. I think auks could also make Kerberos-secured NFSv4 easier to deploy, which would then be a safe place to store Curve keys. Go auks.

Anyway first step is adding kerberos options to libflux/security.c and the flux-broker.

garlick commented 8 years ago

Also (gasp) libflux/security.c needs a bit of attention. It still exits on malloc error, doesn't conform to style guidelines, etc..

garlick commented 7 years ago

Some refs:

garlick commented 7 years ago

I have kerberos set up on my desktop and added GSSAPI support to flux in a test branch. Sockets are set up like this, with my user principal (c->principal == "garlick") on the client side of the socket, and the local "host" principal on the server end

flux_sec_ssockinit() calls:

zsock_set_gssapi_server (sock, 1);
zsock_set_gssapi_principal (sock, "host");

flux_sec_csockinit() calls:

 zsock_set_gssapi_service_principal (sock, "host");
 zsock_set_gssapi_principal (sock, c->principal);

This works as long as I make /etc/krb5.keytab readable by garlick. Here's some test output:

I: 17-04-11 14:09:04 zauth: API command=$TERM
I: 17-04-11 14:09:04 zauth: API command=GSSAPI
ok 80 - flux_sec_comms_init GSSAPI works
ok 81 - flux_sec_ssockinit works
ok 82 - server bound to localhost on port 49152
ok 83 - flux_sec_csockinit works
ok 84 - client connected to server
ok 85 - client sent Greetings!
I: 17-04-11 14:09:04 zauth: ZAP request mechanism=GSSAPI ipaddress=127.0.0.1
I: 17-04-11 14:09:04 zauth: - allowed (GSSAPI) principal=garlick@CHAOS identity=
I: 17-04-11 14:09:04 zauth: - ZAP reply status_code=200 status_text=OK
ok 86 - server ready within 1s timeout
ok 87 - server received Greetings!
ok 88 - rogue connected to server with no security
ok 89 - rogue sent Avast
ok 90 - server not ready within 0.2s timeout
I: 17-04-11 14:09:05 zauth: API command=$TERM

This is of course all wrong for us - we have brokers in the same instance authenticating as two different users/principals, "host", and "garlick", and normally service principals like "host" in /etc/krb5.keytab are only readable by root. For the overlay network, we'll want:

system instance: both ends authenticating as "flux" service principal (keys stored in alternate flux-readable keytab).

user instance: both ends authenticating as a user principal (which would require users's credentials to be available on all ranks starting brokers)

Some challenges are: 1) how to tell gssapi client side to use a keytab (maybe per-principal config in /etc/krb5.conf?) 2) how to tell gssapi server side to use a user credential cache (this may actually require code changes in libzmq - not sure). 3) the zauth (ZAP) actor in czmq currently provides no per user/principal access control for GSSAPI. Anyone can connect if they have a valid krb5 cred.

It seems like the easiest place to start would be to try to get the system instance working with a flux service principal (solving 1 and 3 above).

grondo commented 7 years ago

nice work!

garlick commented 7 years ago

@dun added a flux principal to our KDC, which I put in /etc/krb5.keytab_flux and made readable:

sudo klist -kt FILE:/etc/krb5.keytab_flux
Keytab name: FILE:/etc/krb5.keytab_flux
KVNO Timestamp           Principal
---- ------------------- ------------------------------------------------------
   2 04/11/2017 16:08:35 flux/jimbo.chaos@CHAOS
   2 04/11/2017 16:08:35 flux/jimbo.chaos@CHAOS
   2 04/11/2017 16:08:35 flux/jimbo.chaos@CHAOS
   2 04/11/2017 16:08:35 flux/jimbo.chaos@CHAOS
   2 04/11/2017 16:08:35 flux/jimbo.chaos@CHAOS

then I tried running the test with

export KRB5_KTNAME=FILE:/etc/krb5.keytab_flux
export KRB5_CLIENT_KTNAME=$KRB5_KTNAME

No luck with either "flux" or "flux/jimbo.chaos" as the principal. Just wnated to checkpoint that.

garlick commented 7 years ago

Small progress:

It turns out that the following setting in /etc/krb5.conf allows service principal credentials to be distributed among different keytab files with permissions appropriate to the application.

[libdefaults]
    default_keytab_name=/etc/krb5.keytab_%{username}

So for my testing I have that plus

$ ls -ld /etc/krb5*
-rw-r--r-- 1 root root    281 Apr 13 10:22 /etc/krb5.conf
-rw------- 1 root root    347 Apr 10 15:51 /etc/krb5.keytab
-rw-r----- 1 flux garlick 347 Apr 11 16:12 /etc/krb5.keytab_flux
lrwxrwxrwx 1 root root     16 Apr 13 10:34 /etc/krb5.keytab_garlick -> krb5.keytab_flux
lrwxrwxrwx 1 root root     11 Apr 13 10:22 /etc/krb5.keytab_root -> krb5.keytab

With this I can successfully authenticate garlick (using keys cached in /tmp with TGT as usual) to the flux service principal. (As a hack, I've made the flux keytab readable by garlick and linked it to a garlick keytab. This lets me run the server side as garlick but still use the flux service principal).

That squares away the server side of the flux system instance. Now to convince the client side to use the keytab instead of the usual handshake with TGT (no 1 above)

garlick commented 7 years ago

Well, it seems that as long as I don't make the zmq_setsockopt(ZMQ_GSSAPI_PRINCIPAL) call on the client side, which I guess leaves the kerberos library to determine what principal to use, and set this directive in /etc/krb5.conf:

[libdefaults]
    default_client_keytab_name=/etc/krb5.keytab_%{username}

then the test running as user flux seems to be able to use the flux keytab for both ends of the authentication (no password prompt). After the test runs, a klist as user flux shows:

$ klist
Ticket cache: FILE:/tmp/krb5cc_999
Default principal: flux/jimbo.chaos@CHAOS

Valid starting       Expires              Service principal
04/13/2017 14:12:29  04/14/2017 02:12:29  krbtgt/CHAOS@CHAOS
    renew until 04/14/2017 14:12:29
04/13/2017 14:12:29  04/14/2017 02:12:29  flux/jimbo.chaos@CHAOS
    renew until 04/14/2017 14:12:29

Here is a wireshark dump of the conversation with the KDC while the test runs

wireshark_pcapng_enp6s0_20170413130505_3UwkBX.pdf

garlick commented 7 years ago

The work to integrate gssapi into flux is checkpointed here:

https://github.com/garlick/flux-core/tree/gssapi

It includes tests that presume a working kerberos environment. It may be too difficult to get a functional kerberos environment set up in travis and hard to get test principals installed in institutional environments, so we need a way to skip these tests if kerberos is not available.

czmq changes not yet submitted:

https://github.com/garlick/czmq/tree/gssapi_support

In addition to the above, we we need to develop the GSSAPI ZAP handler in czmq, as the current one authorizes all connections (they will have already authenticated with Kerberos, but that should not be sufficient).

garlick commented 7 years ago

I've rebased this gssapi branch on current master.

I keep checking to see if there has been a libzmq release that includes my gssapi fixes. Not yet. The last release was 4,2,2 on 2017-02-18. Fixes were merged around 2017-04-21.

We should look into incorporating a test Kerberos KDC into testing with buildbot on AWS.

Above I suggested that we would need to replace the czmq zauth ZAP handler since it doesn't do access control. We might be able to work around that by using the User-Id metadata property - see #1281

garlick commented 6 years ago

Looks like my fixes were incorporated in zeromq-4.2.3, although I think it has to be built with --enable-drafts to be usable:

0MQ version 4.2.3 stable, released on 2017/12/13

garlick commented 6 years ago

zeromq-4.2.4 was just released, and shortly after tagging, the GSSAPI interfaces were moved out of DRAFT status, so the next release should be good to go.