Open onnozweers opened 12 months ago
Additional information: this issue seems to affect also other forms of authentication. Here's an effort with WebDAV basic authentication:
I can read this test file with basic authentication (LDAP account):
[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
Hello world!
I empty the LSC file:
[root@hedgehog14 /etc/grid-security/vomsdir]# mv escape/voms-escape.cloud.cnaf.infn.it.lsc /tmp/
[root@hedgehog14 /etc/grid-security/vomsdir]# touch escape/voms-escape.cloud.cnaf.infn.it.lsc
[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf
[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2880-hedgehog14Domain.service
This gives an error:
[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
curl: (22) NSS: client certificate not found (nickname not specified)
Fixing the LSC file:
[root@hedgehog14 /etc/grid-security/vomsdir]# mv /tmp/voms-escape.cloud.cnaf.infn.it.lsc escape/
mv: overwrite 'escape/voms-escape.cloud.cnaf.infn.it.lsc'? y
[root@hedgehog14 /etc/grid-security/vomsdir]# touch /etc/dcache/gplazma.conf
[root@hedgehog14 /etc/grid-security/vomsdir]# systemctl restart dcache@webdav2880-hedgehog14Domain.service
Now it's OK again:
[onno@ui ~]# curl --user onno --fail "https://dcachetest.grid.surfsara.nl:2880/users/onno/disk/testfile1"
Enter host password for user 'onno':
Hello world!
If the LSC file is not empty but malformed, the effect is the same.
echo "malformed" > escape/voms-escape.cloud.cnaf.infn.it.lsc
Just to clarify what is happening here (a little).
The LSC file parsing error
comes from the Java CaNL library, not from dCache. The fact that a single bad LSC file seems to prevent any VOMS authentication is a feature/bug of that library. We should open an issue with CaNL to resolve this.
What is also interesting is the above failure includes a stack-trace, indicating a bug in dCache. This bug is independent of CaNL, and (I imagine) was triggered by your specific gplazma configuration and (in particular) the AuthN phase plugins.
Hi Paul,
Thanks for your explanation. Here's our gplazma.conf; I hope this helps a bit to understand the stack-trace. Let me know if you need more information.
auth optional x509
auth optional voms
auth optional kpwd
auth optional jaas gplazma.jaas.name=LdapGplazma
# Mapping based on VOMS proxies and roles from /etc/grid-security/grid-vorolemap
map optional vorolemap
# RCauth DN to username mapping, /etc/grid-security/grid-mapfile
map optional gridmap
#
map optional mutator gplazma.mutator.accept=com.sun.security.auth.UserPrincipal gplazma.mutator.produce=username
# Read user and group IDs from /etc/grid-security/storage-authzdb
map sufficient authzdb
# Read from /etc/dcache/dcache.kpwd
map sufficient kpwd
map sufficient ldap
# Block users in ban file
account requisite banfile
account sufficient kpwd
session required roles
session sufficient authzdb
session sufficient kpwd
session sufficient ldap
1) yet another reason to use java17 as runtime JVM:
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: java.lang.NullPointerException: Cannot invoke "org.dcache.gplazma.monitor.LoginResult$SetDiff.getBefore()" because the return value of "org.dcache.gplazma.monitor.LoginResult$AuthPhaseResult.getPrincipals()" is null
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.storageSubjectFor(RecordFailedLogins.java:88)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins$KnownFailedLogins.has(RecordFailedLogins.java:98)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:124)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.RecordFailedLogins.accept(RecordFailedLogins.java:36)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.GPlazma.lambda$login$1(GPlazma.java:142)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at java.base/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
Nov 22 16:14:52 dcache-lab007 dcache@core-dcache-lab007[28521]: at org.dcache.gplazma.GPlazma.login(GPlazma.java:142)
2) I can reproduce the error when plugin role
is enabled.
Dear dCache devs,
An empty LSC file breaks gPlazma VOMS authentication. Not only for the VO who's LSC file is empty, but for all VOs. I accidentally ran into this situation when I was cleaning up LSC files in our production instance running 8.2.32, and emptied a file instead of deleting it. The problem is logged, but since our logs are quite large, it's easy to overlook it. It took me a while before I found out what the problem was.
I reproduced it on our test server running today's master snapshot.
I created a VOMS proxy:
A dir listing shows that authentication works:
Moving an LSC file and creating an empty one in its place:
Please note, the LSC file is for the Escape VO, but I have a Dteam proxy!
We need to refresh some services for the change to take effect:
Then we try the same directory listing:
Restoring the normal situation:
And now authentication succeeds again:
From the gPlazma log:
Is it intentional that a malformed LSC file breaks all VOMS authentication?
Cheers, Onno