dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

4.2.26 break webdav door because of undefined webdav.macaroons.accept-over-unencrypted-channel #4678

Closed calestyo closed 5 years ago

calestyo commented 5 years ago

Hi.

In 4.2.26 (which I had do install relatively quickly without my usual deeper testing, because of other bugs it fixes killed our production) the webdav door is completely broken:

Feb 19 10:23:06 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116612]: 2019-02-19 10:23:06+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:23:19 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116681]: 2019-02-19 10:23:19+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:23:33 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116749]: 2019-02-19 10:23:33+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:23:47 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116819]: 2019-02-19 10:23:47+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:23:47 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116819]: 2019-02-19 10:23:47+01:00 (c-core2-AAWCO8xfrWg-AAWCO8xgA1g) [] Failed to add route: No such cell: c-core2-AAWCO8xfrWg-AAWCO8xgA1g@webdav_lcg-lrz-dc14
Feb 19 10:24:00 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116882]: 2019-02-19 10:24:00+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:24:13 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[116959]: 2019-02-19 10:24:13+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:24:27 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[117027]: 2019-02-19 10:24:27+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:24:40 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[117097]: 2019-02-19 10:24:40+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:24:53 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[117207]: 2019-02-19 10:24:53+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel
Feb 19 10:25:07 lcg-lrz-dc14 dcache@webdav_lcg-lrz-dc14[117291]: 2019-02-19 10:25:07+01:00 (System) [] Failure at startup: (666) URL [file:/usr/share/dcache/services/webdav.batch]: line 79: (1) variable is not defined : webdav.macaroons.accept-over-unencrypted-channel

:-(

TBH, if dCache would behave a bit more usual as most other software does, I could have rather easily noticed that issue immediately after starting it, and not hours later when flooded by tickets

but more important:

I've reported at least some of these init-system issues years ago already... if dcache would either use modern means to signal the process manager whether startup worked or not.... or at least no do it's own asynchronous start/stop magic,... such an error would have been noted right after starting dcache with the new version, as systemd would have been able to just report the error. :-(

Anyway... just defining the webdav.macaroons.accept-over-unencrypted-channel in dcache.conf seems to workaround the bug.

Chris.

calestyo commented 5 years ago

And perhaps one more word:

The whole thing of webdav.macaroons.accept-over-unencrypted-channel does not seem like a bugfix or an urgent security fix that cannot be worked around by other means (it can, by simply disabling macaroons)... so shouldn't this rather have gone into the next major version than into a minor bugfix release?

Best wishes, Chris.

calestyo commented 5 years ago

Having diged deeper into this a bit reveals it's also (partially) my fault.

webdav.macaroons.accept-over-unencrypted-channel is in fact defined, but got overridden by my own version of a webdav.properties You may wonder now, why I do overwrite this file,... it's because of #3309.

Back then when the new config system was added, site specific options were advertised... in addition, later most of the protocol-flavour-specific options (e.g. things like having automatically a different mover queue for e.g. plain/gsi flavours of ftp) were dropped or made immutable (I had a long discussion back then with Gerd) with the reasoning, that one can rebuild the same with site specific options if desired.

Well that was desired, but due to the nature the config-check works it's broken since then, as either one can't just do it, or one has to hack the site-specific options into the properties files (as described in #3309).

Since I didn't expect new features and their config to be added, I took the old version of my adapted properties file (of course I did an check-config, but that worked). And in the next step, since systemd cannot see that the service actually fails, I didn't notice it then either.

So it's not really a bug,... but still it is given how all the problem described above work together :-(

Cheers, Chris

paulmillar commented 5 years ago

Hi Chris,

Yes, we strongly recommend sites do NOT modify the defaults file for exactly these kinds of problems. I even say that modifying the default files is an unsupported deployment strategy.

When considering back-porting a patch, one requirement is that it must require no configuration changes. Updating the defaults file is one mechanism to achieve that requirement, while allowing sites to benefit from the patch.

You also mentioned "webdav.macaroons.accept-over-unencrypted-channel does not seem like a bugfix or an urgent security fix".

This is not true. This change was based on the advice of several security experts that, by supporting macaroons on unencrypted channels, dCache is potentially vulnerable (or rather, it allows clients to persist in vulnerably behaviour).

Since sites are already deploying macaroon support. Back-porting this patch to our supported branches allows sites to review their deployment and see whether they can disable supporting macaroons over unencrypted channels. This is an evaluation they can conduct on their own timescale, including providing test endpoints for users to verify their work-flows.

Sites cannot always conduct a major upgrade and should not be forced to do so in the name of security, if they are running a supported version of dCache.

calestyo commented 5 years ago

Yes, we strongly recommend sites do NOT modify the defaults file for exactly these kinds of problems. I even say that modifying the default files is an unsupported deployment strategy.

Well I know, and I don't to this because it's so funny ;-) ... but simply because of #3309. As long as this isn't fixed, I can either continue to "hack" the default files... or just give up on any automatical per-protocol flavour settings, which however blows up the config files and makes them far less readable. Changing files in /usr should never be done (even though dCache does so itself, and by that breaks all kinds of subtle things in debian ;-) )-

Right now, for a specific door I just set e.g. webdav.authn.protocol=http and all other webdav related options, like mover queues and so on, will be automatically be set on that protocol flavours and the global defaults I've set for it. I wouldn't like to give up on this, as I'd have to duplicate all these settings for every door and flavour, which makes it far more error prone that one is forgotten, if a setting is changed.

I guess one can debate whether this is a security fix or not... especially when the default is still to be unsecure. And every security expert should probably know, that bearer tokens are by design vulnerable to possession. Placing the TLS sticker around it doesn't give any real security (like the web doesn't become secure, just because of (effectively anonymous) encryption everywhere).

I doubt that most sites did the effort (like I did) to get e.g. the IGTF bundle (respectively the signing key) via some secure path, so even with macaroon over encryption, people will just sent their credentials over some path that they cannot really trust.

Also,… isn't soooo much of everything else still unecrypted (e.g. data transfers?)?

Some security experts should perhaps invest their time more in real security threats (e.g. containers)... instead of thing were even another layer of security around won't change that much.

Anway... I'm drifting of into security philosophy...

As already said earler... the whole issue was a unfortunate chain of circumstances.... me hacking the defaults because of the longstanding #3309 ... and the systemd-mess.