dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

An empty or malformed LSC file breaks all authentication #7428

Open onnozweers opened 12 months ago

onnozweers commented 12 months ago

Dear dCache devs,

An empty LSC file breaks gPlazma VOMS authentication. Not only for the VO who's LSC file is empty, but for all VOs. I accidentally ran into this situation when I was cleaning up LSC files in our production instance running 8.2.32, and emptied a file instead of deleting it. The problem is logged, but since our logs are quite large, it's easy to overlook it. It took me a while before I found out what the problem was.

I reproduced it on our test server running today's master snapshot.

I created a VOMS proxy:

[onno@ui ~]# voms-proxy-info -all
subject   : /DC=org/DC=terena/DC=tcs/C=NL/O=SURF B.V./CN=Onno Zweers zweer001@surf.nl/CN=474493364
issuer    : /DC=org/DC=terena/DC=tcs/C=NL/O=SURF B.V./CN=Onno Zweers zweer001@surf.nl
identity  : /DC=org/DC=terena/DC=tcs/C=NL/O=SURF B.V./CN=Onno Zweers zweer001@surf.nl
type      : RFC3820 compliant impersonation proxy
strength  : 2048
path      : /tmp/x509up_u31029
timeleft  : 09:43:25
key usage : Digital Signature, Key Encipherment
=== VO dteam extension information ===
VO        : dteam
subject   : /DC=org/DC=terena/DC=tcs/C=NL/O=SURF B.V./CN=Onno Zweers zweer001@surf.nl
issuer    : /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
attribute : /dteam/Role=NULL/Capability=NULL
attribute : /dteam/NGI_NL/Role=NULL/Capability=NULL
timeleft  : 09:43:25
uri       : voms2.hellasgrid.gr:15004

A dir listing shows that authentication works:

[onno@ui ~]# gfal-ls https://dcachetest.grid.surfsara.nl:2884/groups/
dteam
escape
ska

Moving an LSC file and creating an empty one in its place:

[root@hedgehog14 /etc/grid-security/vomsdir]# mv escape/voms-escape.cloud.cnaf.infn.it.lsc /tmp/

[root@hedgehog14 /etc/grid-security/vomsdir]# touch escape/voms-escape.cloud.cnaf.infn.it.lsc

Please note, the LSC file is for the Escape VO, but I have a Dteam proxy!

We need to refresh some services for the change to take effect:

[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf

[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2884-hedgehog14Domain.service

Then we try the same directory listing:

[onno@ui ~]# gfal-ls https://dcachetest.grid.surfsara.nl:2884/groups/
gfal-ls error: 13 (Permission denied) - Result HTTP 401 : Authentication Error  after 1 attempts

Restoring the normal situation:

[root@hedgehog14 /etc/grid-security/vomsdir]# mv /tmp/voms-escape.cloud.cnaf.infn.it.lsc escape/
mv: overwrite 'escape/voms-escape.cloud.cnaf.infn.it.lsc'? y

[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf

[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2884-hedgehog14Domain.service

And now authentication succeeds again:

[onno@ui ~]# gfal-ls https://dcachetest.grid.surfsara.nl:2884/groups/
dteam
escape
ska

From the gPlazma log:

10 Nov 2023 14:18:05 (gPlazma) [webdav2884-hedgehog14 Login] LSC file parsing error: Malformed LSC file (vo=escape, host=voms-escape.cloud.cnaf.infn.it): No distinguished name entries found.
10 Nov 2023 14:18:05 (gPlazma) [webdav2884-hedgehog14 Login] Login operation failed
java.lang.NullPointerException: null
        at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.storageSubjectFor(RecordFailedLogins.java:88)
        at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.has(RecordFailedLogins.java:98)
        at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:124)
        at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:36)
        at org.dcache.gplazma.GPlazma.lambda$login$1(GPlazma.java:142)
        at java.base/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
        at org.dcache.gplazma.GPlazma.login(GPlazma.java:142)
        at org.dcache.auth.Gplazma2LoginStrategy.login(Gplazma2LoginStrategy.java:211)
        at org.dcache.services.login.MessageHandler.messageArrived(MessageHandler.java:58)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.dcache.cells.CellMessageDispatcher$LongReceiver.deliver(CellMessageDispatcher.java:286)
        at org.dcache.cells.CellMessageDispatcher.call(CellMessageDispatcher.java:188)
        at org.dcache.cells.AbstractCell.messageArrived(AbstractCell.java:302)
        at dmg.cells.nucleus.CellAdapter.messageArrived(CellAdapter.java:856)
        at dmg.cells.nucleus.CellNucleus$DeliverMessageTask.run(CellNucleus.java:1273)
        at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at dmg.cells.nucleus.CellNucleus.lambda$wrapLoggingContext$2(CellNucleus.java:725)
        at java.base/java.lang.Thread.run(Thread.java:829)

Is it intentional that a malformed LSC file breaks all VOMS authentication?

Cheers, Onno

onnozweers commented 12 months ago

Additional information: this issue seems to affect also other forms of authentication. Here's an effort with WebDAV basic authentication:

I can read this test file with basic authentication (LDAP account):

[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
Hello world!

I empty the LSC file:

[root@hedgehog14 /etc/grid-security/vomsdir]# mv escape/voms-escape.cloud.cnaf.infn.it.lsc /tmp/

[root@hedgehog14 /etc/grid-security/vomsdir]# touch escape/voms-escape.cloud.cnaf.infn.it.lsc

[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf

[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2880-hedgehog14Domain.service

This gives an error:

[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
curl: (22) NSS: client certificate not found (nickname not specified)

Fixing the LSC file:

[root@hedgehog14 /etc/grid-security/vomsdir]# mv /tmp/voms-escape.cloud.cnaf.infn.it.lsc escape/
mv: overwrite 'escape/voms-escape.cloud.cnaf.infn.it.lsc'? y

[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf

[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2880-hedgehog14Domain.service

Now it's OK again:

[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
Hello world!
onnozweers commented 12 months ago

If the LSC file is not empty but malformed, the effect is the same.

echo "malformed" > escape/voms-escape.cloud.cnaf.infn.it.lsc
paulmillar commented 11 months ago

Just to clarify what is happening here (a little).

The LSC file parsing error comes from the Java CaNL library, not from dCache. The fact that a single bad LSC file seems to prevent any VOMS authentication is a feature/bug of that library. We should open an issue with CaNL to resolve this.

What is also interesting is the above failure includes a stack-trace, indicating a bug in dCache. This bug is independent of CaNL, and (I imagine) was triggered by your specific gplazma configuration and (in particular) the AuthN phase plugins.

onnozweers commented 11 months ago

Hi Paul,

Thanks for your explanation. Here's our gplazma.conf; I hope this helps a bit to understand the stack-trace. Let me know if you need more information.

auth    optional  x509
auth    optional  voms
auth    optional  kpwd
auth    optional  jaas gplazma.jaas.name=LdapGplazma

# Mapping based on VOMS proxies and roles from /etc/grid-security/grid-vorolemap
map     optional vorolemap
# RCauth DN to username mapping, /etc/grid-security/grid-mapfile
map     optional gridmap
#
map     optional mutator gplazma.mutator.accept=com.sun.security.auth.UserPrincipal gplazma.mutator.produce=username
# Read user and group IDs from /etc/grid-security/storage-authzdb
map     sufficient authzdb
# Read from /etc/dcache/dcache.kpwd
map     sufficient kpwd
map     sufficient ldap

# Block users in ban file
account requisite banfile
account sufficient kpwd

session required roles
session sufficient authzdb
session sufficient kpwd
session sufficient ldap
kofemann commented 11 months ago

1) yet another reason to use java17 as runtime JVM:

Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: java.lang.NullPointerException: Cannot invoke "org.dcache.gplazma.monitor.LoginResult$SetDiff.getBefore()" because the return value of "org.dcache.gplazma.monitor.LoginResult$AuthPhaseResult.getPrincipals()" is null
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.storageSubjectFor(RecordFailedLogins.java:88)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.has(RecordFailedLogins.java:98)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:124)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:36)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.GPlazma.lambda$login$1(GPlazma.java:142)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at java.base/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.GPlazma.login(GPlazma.java:142)

2) I can reproduce the error when plugin role is enabled.