a-langer / nexus-sso

Single Sign-On patch for Nexus OSS
Eclipse Public License 1.0
70 stars 16 forks source link

SSO patch without Docker - 500 Internal Server Error #8

Closed bogdankatishev closed 1 year ago

bogdankatishev commented 1 year ago

Hello,

We have a nexus on an EC2 instance on AWS where we applied the SSO patch by:

We see the "Sign in with SSO" button but when we click on it, we end up on the /index.html endpoint with a Error 500 Internal Server Error page from nexus jetty server.

image

We use the default XML files (metadata-keycloak.xml metadata.xml samlKeystore.jks shiro.ini sp-metadata.xml urlrewrite.xml) that are already provided in this repo. So we should get a redirect to Okta, like on the demo docker image.

We also already added these lines to our logback.xml: https://github.com/a-langer/nexus-sso/blob/main/docs/SAML.md#debug but with no luck. We do not see more related logging about the error 500.

We also do not use an .env file, but I could not see what env vars are critical to make this work.

At this moment we are looking for a needle in the haystack. Without better logging, we can not find the problem.

a-langer commented 1 year ago

I have no experience applying SSO-patch a non-Docker Nexus instance, so given the difficulty of diagnosing, I can't make specific recommendations for setting up this configuration.

To applying SSO-patch your Nexus instance you need to follow all instructions from https://github.com/a-langer/nexus-sso/blob/main/Dockerfile, including copying scripts and configs to ${NEXUS_HOME}/etc/sso/script/com/github/alanger/nexus/bootstrap/.

I can assume that Nexus outside the container determines the current directory differently, therefore it cannot read some configuration files (in all configs, a relative path to the ETC directory is used). The error returned will always be error 500, because it is any internal server error. To see more information you can enabling in logback.xml the TRACE debug level for the root logger:

<root level="${root.level:-TRACE}">
...

Please give feedback if you manage to find the right settings to apply the SSO-patch outside of Docker.

bogdankatishev commented 1 year ago

@a-langer Hello,

I enabled full TRACE logging in nexus logback.xml and the 2 most interesting log traces that I could find were these:

org.apache.shiro.authc.IncorrectCredentialsException: Submitted credentials for token [org.apache.shiro.authc.UsernamePasswordToken - admin, rememberMe=false] did not match the expected credentials.
    at org.apache.shiro.realm.AuthenticatingRealm.assertCredentialsMatch(AuthenticatingRealm.java:603)
    at org.apache.shiro.realm.AuthenticatingRealm.getAuthenticationInfo(AuthenticatingRealm.java:581)
    at org.sonatype.nexus.repository.security.internal.DefaultUserHealthCheck.lambda$1(DefaultUserHealthCheck.java:67)
    at java.util.Optional.map(Optional.java:215)
    at org.sonatype.nexus.repository.security.internal.DefaultUserHealthCheck.check(DefaultUserHealthCheck.java:67)
    at com.codahale.metrics.health.HealthCheck.execute(HealthCheck.java:374)
    at com.codahale.metrics.health.HealthCheckRegistry.runHealthCheck(HealthCheckRegistry.java:160)
    at org.sonatype.nexus.rapture.internal.HealthCheckCacheManager$1.load(HealthCheckCacheManager.java:98)
    at org.sonatype.nexus.rapture.internal.HealthCheckCacheManager$1.load(HealthCheckCacheManager.java:1)
    at com.google.common.cache.CacheLoader.reload(CacheLoader.java:101)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3532)
    at com.google.common.cache.LocalCache$Segment.loadAsync(LocalCache.java:2287)
    at com.google.common.cache.LocalCache$Segment.refresh(LocalCache.java:2360)
    at com.google.common.cache.LocalCache.refresh(LocalCache.java:4134)
    at com.google.common.cache.LocalCache$LocalLoadingCache.refresh(LocalCache.java:4965)
    at org.sonatype.nexus.rapture.internal.HealthCheckCacheManager.lambda$1(HealthCheckCacheManager.java:75)
    at java.lang.Iterable.forEach(Iterable.java:75)
    at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1082)
    at org.sonatype.nexus.rapture.internal.HealthCheckCacheManager.lambda$0(HealthCheckCacheManager.java:73)
    at org.sonatype.nexus.scheduling.internal.PeriodicJobServiceImpl.lambda$2(PeriodicJobServiceImpl.java:109)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

and

2023-07-31 09:17:26,817+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLObjectProviderRegistry - Registering new builder, marshaller, and unmarshaller for {http://www.w3.org/2001/04/xmlenc#}EncryptionMethod
2023-07-31 09:17:26,817+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.XMLObjectBuilderFactory - Registering builder org.opensaml.xmlsec.encryption.impl.EncryptionMethodBuilder under key {http://www.w3.org/2001/04/xmlenc#}EncryptionMethod
2023-07-31 09:17:26,817+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.io.MarshallerFactory - Registering marshaller, org.opensaml.xmlsec.encryption.impl.EncryptionMethodMarshaller, for object type {http://www.w3.org/2001/04/xmlenc#}EncryptionMethod
2023-07-31 09:17:26,818+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.io.UnmarshallerFactory - Registering unmarshaller, org.opensaml.xmlsec.encryption.impl.EncryptionMethodUnmarshaller, for object type, {http://www.w3.org/2001/04/xmlenc#}EncryptionMethod
2023-07-31 09:17:26,818+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLConfigurator - {http://www.w3.org/2001/04/xmlenc#}EncryptionMethod initialized and configuration cached
2023-07-31 09:17:26,818+0000 DEBUG [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLConfigurator - Initializing object provider {http://www.w3.org/2001/04/xmlenc#}EncryptionProperties
2023-07-31 09:17:26,818+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLConfigurator - Creating instance of org.opensaml.xmlsec.encryption.impl.EncryptionPropertiesBuilder
2023-07-31 09:17:26,819+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLConfigurator - Creating instance of org.opensaml.xmlsec.encryption.impl.EncryptionPropertiesMarshaller
2023-07-31 09:17:26,820+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolving configuration propreties source
2023-07-31 09:17:26,820+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Unable to resolve non-null configuration properties from any ConfigurationPropertiesSource
2023-07-31 09:17:26,821+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolved effective configuration partition name 'default'
2023-07-31 09:17:26,821+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.xml.config.XMLConfigurator - Creating instance of org.opensaml.xmlsec.encryption.impl.EncryptionPropertiesUnmarshaller
2023-07-31 09:17:26,822+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolving configuration propreties source
2023-07-31 09:17:26,822+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Unable to resolve non-null configuration properties from any ConfigurationPropertiesSource
2023-07-31 09:17:26,822+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolved effective configuration partition name 'default'
2023-07-31 09:17:26,823+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolving configuration propreties source
2023-07-31 09:17:26,823+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Unable to resolve non-null configuration properties from any ConfigurationPropertiesSource
2023-07-31 09:17:26,823+0000 TRACE [Thread-173]  *SYSTEM org.opensaml.core.config.ConfigurationService - Resolved effective configuration partition name 'default'

I only found a reference here: https://github.com/a-langer/nexus-sso/blob/main/etc/sso/script/com/github/alanger/nexus/bootstrap/EchoRealm.java#L8C31-L8C52 but I can't seem to understand where it is being set.

a-langer commented 1 year ago

IncorrectCredentialsException - Incorrect login or password. The stack shows that this error has nothing to do with the SSO-patch. Try giving absolute paths to all files. Or try specifying the working directory via JAVA properties, example: -Dkaraf.data=/app/nexus3/data or -Duser.dir=/app/nexus3/data (not tested).

As I wrote earlier, I have no experience with the SSO patch on Nexus without Docker, so I can't give any advice on setting up this configuration.

bogdankatishev commented 1 year ago

Okay @a-langer I think I found the error/issue. I enabled full tracing on our nexus with adding/editing this in our logback.xml

<root level="${root.level:-TRACE}">
...

Because nexus is outputting/spamming a lot of debug output, it was hard to catch the error at first time. But after carefully analyzing the logs I noticed this error:

Caused by: org.pac4j.saml.exceptions.SAMLException: Error loading keystore
    at org.pac4j.saml.crypto.KeyStoreCredentialProvider.loadKeyStore(KeyStoreCredentialProvider.java:132)
    at org.pac4j.saml.crypto.KeyStoreCredentialProvider.<init>(KeyStoreCredentialProvider.java:57)
    at org.pac4j.saml.crypto.KeyStoreCredentialProvider.<init>(KeyStoreCredentialProvider.java:76)
    at org.pac4j.saml.client.SAML2Client.initCredentialProvider(SAML2Client.java:220)
    at org.pac4j.saml.client.SAML2Client.clientInit(SAML2Client.java:122)
    at org.pac4j.core.client.IndirectClient.internalInit(IndirectClient.java:58)
    at org.pac4j.core.util.InitializableObject.init(InitializableObject.java:20)
    at org.pac4j.core.client.IndirectClient.getRedirectAction(IndirectClient.java:93)
    at org.pac4j.core.client.IndirectClient.redirect(IndirectClient.java:79)
    at org.pac4j.core.engine.DefaultSecurityLogic.redirectToIdentityProvider(DefaultSecurityLogic.java:217)
    at org.pac4j.core.engine.DefaultSecurityLogic.perform(DefaultSecurityLogic.java:149)
    at io.buji.pac4j.filter.SecurityFilter.doFilter(SecurityFilter.java:84)
    at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
    at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:458)
    at org.sonatype.nexus.security.SecurityFilter.executeChain(SecurityFilter.java:96)
    at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:373)
    at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
    at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
    at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
    at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:370)
    ... 56 common frames omitted
Caused by: java.io.IOException: Invalid keystore format
    at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:666)
    at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57)
    at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)
    at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71)
    at java.security.KeyStore.load(KeyStore.java:1445)
    at org.pac4j.saml.crypto.KeyStoreCredentialProvider.loadKeyStore(KeyStoreCredentialProvider.java:129)
    ... 75 common frames omitted

After reading this, I checked the samlKeystore.jks file key and it was indeed in a wrong format:

~/ keytool -list -keystore /opt/sonatype/nexus/etc/sso/config/samlKeystore.jks
keytool error: java.io.IOException: Invalid keystore format
~/ cat /opt/sonatype/nexus/etc/sso/config/samlKeystore.jks
version https://git-lfs.github.com/spec/v1
oid sha256:89fb975aaded48d385427674ce5e9db9f6b42323a7618b21064800d4500be23b
size 2219

After seeing this, I immediately saw what the problem was. We build the JAR files by ourself in our CI/CD server by cloning this nexus-sso repo and then executing the following maven commands:

mvn versions:set -DnewVersion=...
mvn clean package

After that we follow the override/copy process like described in the Dockerfile: https://github.com/a-langer/nexus-sso/blob/main/Dockerfile#L16-L30

BUT the issue lies here: https://github.com/a-langer/nexus-sso/blob/main/Dockerfile#L29C16-L29C16 If you checkout this repo (nexus-sso) and you DON'T have git lfs extension installed with git lfs install on your CI/CD server (which was our case), you will get a samlKeystore.jks file that contains LFS metadata like this:

version https://git-lfs.github.com/spec/v1
oid sha256:89fb975aaded48d385427674ce5e9db9f6b42323a7618b21064800d4500be23b
size 2219

Which is ofcourse not a valid keystore format and which will break the sso patch.

So maybe we need to update the README to also mention that git lfs needs to be enabled to clone this file correctly :)

a-langer commented 1 year ago

Great job! Thank you for your hard work. I removed JKS storage from LFS, now it is loaded as a binary file https://github.com/a-langer/nexus-sso/commit/c71047ab4a1a04c3cc20b2cbccde26a2f7113d27.

bogdankatishev commented 1 year ago

@a-langer Hmm, after pulling your updated changes, the samlKeystore.jks is not valid anymore,

Before:

~/ file etc/sso/config/samlKeystore.jks 
etc/sso/config/samlKeystore.jks: Java KeyStore

After

~/ file etc/sso/config/samlKeystore.jks 
etc/sso/config/samlKeystore.jks: data
a-langer commented 1 year ago

I changed the storage format to a newer one, try checking through keytool:

keytool -list -keystore etc/sso/config/samlKeystore.jks -storepass pac4j-demo-passwd

Output in my case:

Keystore type: PKCS12
Keystore provider: SUN

Your keystore contains 1 entry

pac4j-demo, Aug 2, 2023, PrivateKeyEntry, 
Certificate fingerprint (SHA-256): 8B:AB:01:36:EF:B8:FE:0C:4F:F9:24:74:C4:49:3E:62:DD:B8:BF:BB:15:11:6D:85:63:01:F0:D8:43:00:DE:0B

Warning:
<pac4j-demo> uses the SHA1withRSA signature algorithm which is considered a security risk. This algorithm will be disabled in a future update.
bogdankatishev commented 1 year ago

@a-langer I checked it with the keytool. Try removing your entire nexus-sso repo and clone it back in:

rm -rf nexus-sso
git clone https://github.com/a-langer/nexus-sso.git
keytool -list -keystore etc/sso/config/samlKeystore.jks -storepass pac4j-demo-passwd

You will see the error message that the key samlKeystore.jks is invalid.

Or just remove samlKeystore.jks and do a git pull.

a-langer commented 1 year ago

It turns out that keytooll from JDK 8 not supported new JKS format (previously tested on JDK 11):

# openjdk version "1.8.0_332"
keytool -list -keystore etc/sso/config/samlKeystore.jks -storepass pac4j-demo-passwd
keytool error: java.io.IOException: Invalid keystore format

However, in the application itself, JKS storage works.

bogdankatishev commented 1 year ago

My nexus application is still not working -> Internal server 500 error on /index.html page. With the same error message like above:

Caused by: org.pac4j.saml.exceptions.SAMLException: Error loading keystore
Caused by: java.io.IOException: Invalid keystore format

Nexus server is running on openjdk 1.8.xxx, just like the container image. And on the ec2 instance itself I also get the error:

keytool -list -keystore etc/sso/config/samlKeystore.jks -storepass pac4j-demo-passwd
keytool error: java.io.IOException: Invalid keystore format
bogdankatishev commented 1 year ago

It turns out that keytooll from JDK 8 not supported new JKS format (previously tested on JDK 11):

# openjdk version "1.8.0_332"
keytool -list -keystore etc/sso/config/samlKeystore.jks -storepass pac4j-demo-passwd
keytool error: java.io.IOException: Invalid keystore format

However, in the application itself, JKS storage works.

I also do not understand this logic. When reading the pom.xml file https://github.com/a-langer/nexus-sso/blob/main/pom.xml, the patch is meant to be build with java jdk 8 only. But your samlKeystore.jks is built using JDK 11?

a-langer commented 1 year ago

Uploaded compatible JKS format PKCS12 for JDK 8 https://github.com/a-langer/nexus-sso/commit/75d4033c420c0ea8f7451e9dfd40957883ead33e

bogdankatishev commented 1 year ago

Yes now it works perfectly on my EC2 instance! Closing issue.