Closed 2bPro closed 1 year ago
I can revisit the README regarding the secrets directory. The keystore.p12
needs to be generated with bin/keystore-init
and loaded in your etc/production.yaml.gotmpl
file. There are two examples for that in the file.
Aaah, got it. Yes, it would be great if that could be clarified in the README. Thanks. I'm now getting:
in helmfile.d/10-managementportal.yaml: error during 10-managementportal.yaml.part.1 parsing: template:
stringTemplate:27:23: executing "stringTemplate" at <.Values.management_portal._chart_version>:
can't evaluate field _chart_version in type interface {}
Any ideas about what this could be caused by?
This can happen if you specify
management_portal:
# nothing or just comments
This makes management_portal
a nil
value which will override any values given to it in other values files.
Thanks for the quick replies. Is that in the etc/production.yaml.gotmpl
? Because I just followed the comments and have:
management_portal:
{{/* keystore: {{ readFile "../etc/management-portal/keystore.p12" | b64enc | quote }} */}}
I double-checked the path is correct and the key is there and they seem to be ok.
Oh, I thought I'd try a fresh re-install, and interestingly helmfile destroy
doesn't work now either. It gives me a similar error but for oauth_clients:
$ helmfile destroy
Adding repo radar https://radar-base.github.io/radar-helm-charts
"radar" has been added to your repositories
Listing releases matching ^velero$
in helmfile.d/30-push-endpoint.yaml: error during 30-push-endpoint.yaml.part.1 parsing: template: stringTemplate:36:26: executing "stringTemplate" at <.Values.management_portal.oauth_clients>: nil pointer evaluating interface {}.oauth_clients
@2bPro {{/*
is also considered part of the comment section and you should remove them and your file should look like this:
management_portal:
keystore: {{ readFile "../etc/management-portal/keystore.p12" | b64enc | quote }}
Bah, I had a feeling I was doing something stupid. Many thanks to both of you!
Sorry to open this again but while the service started, it's not stable. It has restarted close to 200 times since yesterday and is currently stuck in a CrashLoopBackOff
state. Here's the pod logs:
INFO 1 --- [main] com.hazelcast.core.LifecycleService: [10.42.0.159]:5701 [dev] [3.12.10] [10.42.0.159]:5701 is STARTED
INFO 1 --- [main] c.h.h.HazelcastCacheRegionFactory: Starting up HazelcastCacheRegionFactory
INFO 1 --- [main] c.h.h.instance.HazelcastInstanceFactory: Using existing HazelcastInstance [ManagementPortal].
INFO 1 --- [main] s.j.ManagementPortalOauthKeyStoreHandler: Using Management Portal base-url http://localhost:8080/managementportal
WARN 1 --- [main] s.j.ManagementPortalOauthKeyStoreHandler: JWT key store class path resource [config/keystore.p12]
does not contain private key pair for alias radarbase-managementportal-ec
WARN 1 --- [main] s.j.ManagementPortalOauthKeyStoreHandler : JWT key store class path resource [config/keystore.p12]
does not contain private key pair for alias radarbase-managementportal-ec
INFO 1 --- [main] o.r.management.config.WebConfigurer: Web application configuration, using profiles: prod
INFO 1 --- [main] o.r.management.config.WebConfigurer: Web application fully configured
WARN 1 --- [main] s.j.ManagementPortalOauthKeyStoreHandler: JWT key store class path resource [config/keystore.p12]
does not contain private key pair for alias radarbase-managementportal-ec
WARN 1 --- [main] ConfigServletWebServerApplicationContext: Exception encountered during context initialization -
cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with
name 'OAuth2ServerConfiguration.ResourceServerConfiguration': Unsatisfied dependency expressed through field
'tokenStore'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name
'tokenStore' defined in class path resource
[org/radarbase/management/config/OAuth2ServerConfiguration$AuthorizationServerConfiguration.class]: Bean instantiation
via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate
[org.springframework.security.oauth2.provider.token.TokenStore]: Factory method 'tokenStore' threw exception; nested
exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name
'accessTokenConverter' defined in class path resource
[org/radarbase/management/config/OAuth2ServerConfiguration$AuthorizationServerConfiguration.class]: Bean instantiation
via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate
[org.radarbase.management.security.jwt.ManagementPortalJwtAccessTokenConverter]: Factory method
'accessTokenConverter' threw exception; nested exception is java.lang.IllegalArgumentException: Cannot load JWT signing
key radarbase-managementportal-ec from JWT key store.
INFO 1 --- [main] c.h.h.HazelcastCacheRegionFactory: Shutting down HazelcastCacheRegionFactory
WARN 1 --- [main] c.h.h.instance.HazelcastInstanceFactory: hibernate.cache.hazelcast.shutdown_on_session_factory_close
property is set to 'false'. Leaving current HazelcastInstance active! (Warning: Do not disable Hazelcast hazelcast.shutdownhook.enabled property!)
INFO 1 --- [main] com.hazelcast.core.LifecycleService: [10.42.0.159]:5701 [dev] [3.12.10] [10.42.0.159]:5701 is SHUTTING_DOWN
INFO 1 --- [main] com.hazelcast.instance.Node: [10.42.0.159]:5701 [dev] [3.12.10] Shutting down multicast service...
INFO 1 --- [main] com.hazelcast.instance.Node: [10.42.0.159]:5701 [dev] [3.12.10] Shutting down connection manager...
INFO 1 --- [main] com.hazelcast.instance.Node: [10.42.0.159]:5701 [dev] [3.12.10] Shutting down node engine...
INFO 1 --- [main] com.hazelcast.instance.NodeExtension: [10.42.0.159]:5701 [dev] [3.12.10] Destroying node NodeExtension.
INFO 1 --- [main] com.hazelcast.instance.Node: [10.42.0.159]:5701 [dev] [3.12.10] Hazelcast Shutdown is completed in 17 ms.
INFO 1 --- [main] com.hazelcast.core.LifecycleService: [10.42.0.159]:5701 [dev] [3.12.10] [10.42.0.159]:5701 is SHUTDOWN
INFO 1 --- [main] o.r.m.config.CacheConfiguration: Closing Cache Manager
ERROR 1 --- [main] o.s.boot.SpringApplication: Application run failed
I restarted the pod but to no success.
[org/radarbase/management/config/OAuth2ServerConfiguration$AuthorizationServerConfiguration.class]: Bean instantiation
via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate
[org.radarbase.management.security.jwt.ManagementPortalJwtAccessTokenConverter]: Factory method
'accessTokenConverter' threw exception; nested exception is java.lang.IllegalArgumentException: **Cannot load JWT signing
key radarbase-managementportal-ec from JWT key store.**
Looks like the keystore.p12
file isn't correctly loaded.
Thanks for your reply. I tried lots of things, making sure the reference to the key in etc/production.yaml.gotmpl
is as you suggested, restarting the pod, completely destroying and cleaning everything and re-installing, re-generating the key...no luck. I don't know what's going on but it looks like the above error alternates with the following on each pod crash and restart:
INFO 1 --- [main] o.r.management.ManagementPortalApp: The following profiles are active: prod,swagger
WARN 1 --- [main] o.s.boot.actuate.endpoint.EndpointId: Endpoint ID 'hystrix.stream' contains invalid characters, please migrate to a valid format.
WARN 1 --- [main] c.n.c.sources.URLConfigurationSource: No URLs will be polled as dynamic configuration sources.
INFO 1 --- [main] c.n.c.sources.URLConfigurationSource: To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
INFO 1 --- [main] c.netflix.config.DynamicPropertyFactory: DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration@33ecbd6c
DEBUG 1 --- [main] i.g.j.c.liquibase.AsyncSpringLiquibase: Starting Liquibase synchronously
WARN 1 --- [l-1 housekeeper] com.zaxxer.hikari.pool.ProxyLeakTask: Connection leak detection triggered for org.postgresql.jdbc.PgConnection@288f173f on thread main, stack trace follows
java.lang.Exception: Apparent connection leak detected
...
Any ideas?
Got the same issue. Something wrong with the keystore.p12 file.
@ThomasKassiotis I don't think what you have is same as what @2bPro is facing. They can be both on ManagementPortal or related. Your logs from ManagermentPortal says that it can't find the private key pair for alias radarbase-managementportal-ec2022-07-27
. So there is something wrong with the Keystore that is generated.
The Keystore should generate keys aliases radarbase-managementortal-ec
and selfsigned
. See keystore-init. Not sure why the date is appended to the alias. This could be the bug. @ThomasKassiotis can you try removing the keystore and create the keystore file again? then restart the ManagementPortal?
@2bPro I can't really locate the issue with the last logs you have shared. Can you share a longer stack trace?
Can you both share the Java version and version of keytool if you can find it?
@nivemaham, I did a fresh install from the internal-chart-version
branch but I'm seeing the same issue. When I say "fresh install" I mean I destroyed and cleaned everything on k8s, re-cloned the repo, re-set up the configuration and re-generated the keystore file. You can find the longer stack trace attached including what pods are currently up and running and a look at the management portal pod logs:
The Java version:
$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
I can't seem to be able to find the version of keytool itself though.
I have fixed the issue on the app-config-frontend shown in your stack trace in commit 3ae49cba4dd6af9b92cce66178b53c1c2c4f1559.
You are still somehow missing the keystore. When you run
helmfile -f helmfile.d/10-managementportal.yaml --selector name=management-portal template
you should have an entry
# Source: management-portal/templates/secrets-keystore.yaml
apiVersion: v1
kind: Secret
metadata:
name: management-portal-keystore
labels:
app: management-portal
chart: management-portal-0.2.5
release: "management-portal"
heritage: "Helm"
type: Opaque
data:
keystore.p12: <approximately 7000 characters>
If you replace the value of keystore.p12
in the above yaml in the command below where it says CHARACTERS
:
base64 -d <<< CHARACTERS > keystore.p12
diff keystore.p12 etc/management-portal/keystore.p12
then you should get no output. If it says, binary files differ, then the keystore file is not properly loaded.
When running
keytool -list -keystore etc/management-portal/keystore.p12 -storepass radarbase
you should get the following output:
Keystore type: PKCS12
Keystore provider: SUN
Your keystore contains 2 entries
radarbase-managementportal-ec, 13 May 2019, PrivateKeyEntry,
Certificate fingerprint (SHA-256):
<fingerprint>
selfsigned, 13 May 2019, PrivateKeyEntry,
Certificate fingerprint (SHA-256):
<fingerprint>
Thanks for the quick reply. The diff comes back empty but the output of the last command is different:
Keystore type: PKCS12
Keystore provider: SUN
Your keystore contains 1 entry
selfsigned, Aug 16, 2022, PrivateKeyEntry,
Certificate fingerprint (SHA-256):
<fingerprint>
I'm wondering - it could because of some space or overriding issue. I've updated the keystore-init script in ef8c7e95a93e41c695d489fac35be3b62d82c97a and 76eccfc. Could you please remove etc/management-portal/keystore.p12
and try bin/keystore-init
again with these updates? If you provide a DNAME
as follows
DNAME="CN=<your name>,O=<your organization>,L=<your city>,C=<2 letter country code>" bin/keystore-init
you don't have to go through the interactive shell from keytool to query these variables. For full DNAME syntax, see https://docs.oracle.com/javase/8/docs/technotes/tools/windows/keytool.html#CHDHBFGJ.
From 73e7f91 I get this:
$ DNAME="CN=Test,O=Test,L=Test,C=TS" bin/keystore-init
--> Generating keystore to hold EC keypair for JWT signing
Illegal option: -groupname
keytool -genkeypair [OPTION]...
Generates a key pair
Options:
-alias <alias> alias name of the entry to process
-keyalg <keyalg> key algorithm name
-keysize <keysize> key bit size
-sigalg <sigalg> signature algorithm name
-destalias <destalias> destination alias
-dname <dname> distinguished name
-startdate <startdate> certificate validity start date/time
-ext <value> X.509 extension
-validity <valDays> validity number of days
-keypass <arg> key password
-keystore <keystore> keystore name
-storepass <arg> keystore password
-storetype <storetype> keystore type
-providername <providername> provider name
-providerclass <providerclass> provider class name
-providerarg <arg> provider argument
-providerpath <pathlist> provider classpath
-v verbose output
-protected password through protected mechanism
Use "keytool -help" for all available commands
--> Generating keystore to hold RSA keypair for JWT signing
FAILED TO CREATE ECDSA KEY radarbase-managementportal-ec in etc/management-portal/keystore.p12. Please try again.
It looks like although your Java version is 8, your keytool
is still from Java 7. I've changed the script in d2d7ac8 to allow for that as well. Could you please try again?
I tried this and can't even get as far as the management portal pod being installed now. I now get this:
Error: Ingress.extensions "kube-prometheus-stack-grafana" is invalid: annotations.kubernetes.io/ingress.class: Invalid value: "nginx": can not be set when the class field is also set
I updated the helm charts but still got the error so I thought maybe latest commits caused it. I made a couple of reverts to older commits keeping the same key file generated with the updated script but that doesn't work either. Should I open a new issue for this?
I can fix that error. For now, can you install management-portal with
helmfile -f helmfile.d/10-managementportal.yaml apply
I've updated the keystore-init again in c5bdba5, since the current version seems to have decreased validity to only 90 days.
The management portal is now running. Do you know by any chance how I can test it? I tried curling its external address, did pod port-fowarding, entered the container and did a curl on localhost, but all I get is 404.
You should be able to access it via https://myhost/managementportal/
Hmm, getting 502 now. I don't have a certificate so I just did http://myhost/managementportal/. I thought it might be that the port is inaccessible from the outside but I also tried http://localhost/managementportal from inside the ec2 and got the same 502 response. I added 8080 to the EC2 inbound rules but as suspected, it didn't make a difference. Any idea why this might happen?
cert-manager should be creating a HTTPS certificate for you if the host is available from the internet. I'm not sure how to proceed either, apparently nginx cannot successfully connect to managementportal. kubectl get pods should show ManagementPortal as 1/1 ready, is that the case?
That's correct, the pod is up and running (1/1 ready), hasn't restarted/stuck in a crash loop, and the logs say it's up.
WARN 1 --- [l-1 housekeeper] com.zaxxer.hikari.pool.ProxyLeakTask: Connection leak detection triggered for org.postgresql.jdbc.PgConnection@288f173f on thread main, stack trace follows java.lang.Exception: Apparent connection leak detected
management portal ran stably for quite a long time (more than 210 days) and now suddenly gets error -- connection leak detected, exactly as shown here. I tried to reinstall with
helmfile sync --concurrency 1
and it went well but the error remains the same in the new management_portal pod. Yet I see the discussion following this error report, for example, the errrortrace posted later, is not about this error anymore. Hope to get some help
my java version, if that helps
openjdk 11.0.16 2022-07-19
OpenJDK Runtime Environment (build 11.0.16+8-post-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 11.0.16+8-post-Debian-1deb11u1, mixed mode, sharing)
I also posted help in slack https://radardevelopment.slack.com/archives/C021AGGESC9/p1685913110854589
Thank you in advance.
Describe the bug Management portal service crashing because of missing secret.
The installation documentation gives the impression that secrets and passwords are configured inside
etc/production.yaml
file but there is a mention of a missing/secrets
directory and "To create an encrypted password string and put it inside kube_prometheus_stack.nginx_auth variable.", which is confusing.To Reproduce Steps to reproduce the behavior:
environments.yaml
to remove reference toetc/base.yaml
etc/production.yaml
to change server name and example addresses to EC2 public DNSinternal-chart-version
branch (see PR #199)0.3.1
helmfile sync --concurrency 1
Expected behavior The installation documentation specifies that secrets other than in the
etc/production.yaml
file are required for the management portal and gives an example of how to set one up. The management portal service starts.Version of Helm and Kubernetes:
helm version
:kubectl version
:Additional context One-node dev install.