Closed michaelsteigman closed 2 years ago
Thank you for the comprehensive write up @michaelsteigman! @virtualroot Seems like some of these pain points could be alleviated with adjustments to the charts and/or better documentation.
Thanks a lot for all these suggestions, this is incredibly helpful!
@virtualroot how much effort is it to make the necessary documentation and chart changes?
Just circling back around to check on this. Thanks for the responses @sara-tagger , @melindaloubser1 and @tmbo.
Also wanted to link to #154 which I stumbled on shortly after creating the issue. I probably didn't find it in my initial searches because OpenShift is written with a hyphen. The install instructions instruct OpenShift users to set fsGroup
to null but this setting, either on the command line or at the top level of the values file, has no impact on RabbitMQ or Redis, which both fail to start. I tried what the OP in that issue tried as well and ended up throwing up my hands and using the newer Bitnami charts directly.
I have not yet taken down my running stack to try this yet but I assume it still works.
The chart dependencies + docs were updated in the https://github.com/RasaHQ/rasa-x-helm/pull/259.
Can you share your yamls with the bitnami/nginx?
RasaX claim to run on OpenShift, I have doubts about that.
I did get everything running on OpenShift, though it's been a while and I haven't tried to bootstrap the project since my suggestions were incorporated.
That said, as I wrote above, I used an unpriveleged Nginx image at the time. From values.yaml:
nginx:
name: nginxinc/nginx-unprivileged
tag: stable
I have an OpenShift (Azure) 4.6 cluster on which I have been trying to install RasaX. I have a fair amount of experience with OpenShift, both building and deploying my own projects and bootstrapping open source projects.
I followed these instructions. After much reading, digging around and tinkering, I have arrived at what I believe is a working installation. It wasn't easy, though.
I wanted to share some of the roadblocks and solutions I came up with. This was a frustrating experience, especially considering the fact the docs suggest real support for OpenShift. (I don't just want to whine, mind you! I am hoping this information will be helpful to others and I am also willing to do some additional testing for the community if it will help with supporting OpenShift.)
I am also open to feedback about mistakes I have made while going about this.
First steps
I did initially receive the error the instructions warn about relating to user 1001 and got past it by setting
securityContext.fsGroup
to null, as mentioned.However, the Postgresql instance would still not start up:
I read through the Bitnami chart values and issue tracker and tried to set the following, which is recommended for OpenShift:
No luck.
Similar permissions issue with Nginx:
I posted on the forums and got a suggestion to use an unprivileged Nginx image. I added a value to the nginx part of the chart to override the image. That seemed to work.
Deeper into the weeds with Bitnami charts
I still had no PG, Redis or RabbitMQ instance. All the Rasa pods were failing and I could only guess that it might have something to do with the fact there was no database available. I noticed that the subchart version numbers were rather far behind the upstream charts. I tried installing the current PG Bitnami chart and it worked out of the box. My values for the chart are:
I moved on to Redis and was able to get that working with the following values:
Same for RabbitMQ:
Once I had the PG, Redis and RabbitMQ backends running, I turned off the installs in the RasaX chart and added the
existingHost
and related settings to my values files.Home Stretch
After redeploying, the db migration service logs indicated that the migrations were run and I could see new relations in the database.
However, some of the Rasa pods were displaying an authorization error. I found this issue and noticed that my password salt (randomly generated) had a + in it. I removed the + and the authorization error went away.
Finally, just about everything appeared to be working. The event service, however, would not come up - the readiness probe was failing, leading to constant restarts and eventually, a
CrashLoopBackOff
. I disabled the probes to see what would happen and to my surprise, the service started up just fine. It appears theinitialProbeDelay
is just too short. I set it to 30 seconds for now and it seems to be working.Conclusions
I wonder why the RasaX chart doesn't hew to the upstream Bitnami charts? It appears there is work going on there to ensure compatibility with k8s distros. Couldn't RasaX pin their chart to the image version for compatibility while taking advantage of improvements in the charts? The flexibility to use existing hosts for these backends is nice but it shouldn't be required, should it?
Same thing goes for Nginx. I feel like I've got a bit of a Frankenstein on my hands here and it seems unnecessary.
The authorization issue - e.g., what are restrictions on the passwords/salts? - probably ought to be mentioned somewhere in the docs
It might be good to bump the
initialProbeDelay
on the event service probes at least.That's it for now. As I said, I am happy to help with additional testing. Thanks for reading.