Closed lordpengwin closed 3 years ago
Thanks for the report! I could reproduce it. Confirming, it's a bug.
Based on /tooz/drivers/etcd.py
occurence in the error log, wondering if switching to other etcd
version via Helm charts in recent upgrade (https://github.com/StackStorm/stackstorm-ha/pull/163) is an issue here or not.
Indeed new coordination backend is an issue.
Installing chart with coordination disabled:
helm upgrade --set etcd.enabled=false test stackstorm/stackstorm-ha
st2api:
2021-01-04 23:53:41,218 INFO [-] Creating st2api: StackStorm v3.4dev as OpenAPI app.
2021-01-04 23:53:47,298 WARNING [-] Coordination backend is not configured. Code paths which use coordination service will use best effort approach and race conditions are possible.
st2client:
root@test-st2client-69bcb768b-5wts4:/opt/stackstorm# st2 key set foo bar
+------------------+--------------+
| Property | Value |
+------------------+--------------+
| name | foo |
| value | bar |
| scope | st2kv.system |
| expire_timestamp | |
+------------------+--------------+
We'll need to find another etcd version or use another coordination backend like Redis per https://github.com/StackStorm/stackstorm-ha/issues/94#issuecomment-608394681
For everyone affected, temporary workaround is disabling the etcd coordination backend in helm values and provide your own external coordination backend URL via st2.conf section of stackstorm-ha Helm chart values. See https://github.com/StackStorm/stackstorm-ha/blob/e373c7280ffbd720502a45eb97bdf9ccd5391ba5/values.yaml#L517-L520
I'm having time constraints on fixing it really quick, but if anyone wants to help with this ASAP, we'll need to switch to using
https://github.com/bitnami/charts/tree/master/bitnami/redis instead of etcd
for external dependency Chart.
2 places will need an update:
1) Helm Chart Dependencies + Values to use bitnami/redis
3rd party Chart
3) st2.conf connection string for [coordination]
to use Redis instead of etcd
https://github.com/StackStorm/stackstorm-ha/blob/e373c7280ffbd720502a45eb97bdf9ccd5391ba5/templates/configmaps_st2-conf.yaml#L23-L26
In the recent past I had limited success in using Redis
helm chart based deployment from bitnami
when i did it outside of the ST2
chart. Limited because I was not able to make it work with Sentinel
which is recommended for HA. This could very well be an issue on my side, though. will be curious to play with it and see if this time it behaves properly now.
This example https://github.com/openstack/tooz/blob/master/tooz/drivers/redis.py#L147-L155 might give some hints about proper Sentinel/Redis connection string.
@arms11 I'm using this and it works well:
redis://:<pass_word>@<sentinel1_fqdn>:26379?sentinel=<master_name>&sentinel_fallback=<sentinel2_fqdn>:26379&sentinel_fallback=<sentinel3_fqdn>:26379
got it. Thanks. will follow that.
I have a bit of bad news on this front. Over the weekend I tried to connect ST2
with in-place redis
cluster with sentinel
enabled. Following this link https://docs.bitnami.com/tutorials/deploy-redis-sentinel-production-cluster/ (somewhat obsolete after a year!!) I continuously keep getting No master found for <master-name>
no matter what I give in the format described above for connection string. I tried redis
, mymaster
, master0
, even <master pod name>
and <master pod IP>
. But no avail.
I have a dilemma:
HA
.Found this on-line. https://github.com/andymccurdy/redis-py/issues/1388 but even this does not seem to help.
@arms11 If you could compose a draft PR to add the new Redis chart dependency, - we'll help further. I guess @wantdrink had a working setup with Redis + Sentinel.
@armab Yes I'm using stable/redis-ha 4.9.3 with sentinel and the coordinator URL is posted above. @arms11 The bitnami chart you are using are set as non-cluster with sentinel right? Could you provide your redis URL configured in stackstorm?
@wantdrink @armab I have the draft PR: https://github.com/StackStorm/stackstorm-ha/pull/169
Connection String:
url = redis://st2ha-redis-node-0.st2ha-redis-headless:26379?sentinel=mymaster&sentinel_fallback=st2ha-redis-node-1.st2ha-redis-headless:26379&sentinel_fallback=st2ha-redis-node-2.st2ha-redis-headless:26379
Yes, i am using https://github.com/bitnami/charts/tree/master/bitnami/redis overriding their values-production.yaml
I do see lot of issues (on-hold) in the chart repo, but I am not an expert of redis
nor i have used stable
repo chart to say otherwise.
Please provide guidance on how to proceed.
@wantdrink - I did not understand non-cluster
with sentinel part. If you check the helm chart url, the Values.cluster.enabled
and Values.sentinel .enabled
both are true in my deployment.
Understood. I guess you can try cluster=false but sentinel = true. That's a redis HA with sentinel but it's not a cluster.
Just FYI, I was able to fix the issue without changing the coordination engine from etcd to redis by adding ETCD_ENABLE_V2="true"
env variable on etcd helm.
@tarcusx - that's nice to know. Good to have options! Any documentation around what the envt variable was introduced for? was there an actual bug on ETCD side?
@arms11 Note that I'm using my own etcd helm, not the one specified by this stackstorm-ha chart.yaml. But I think the issue I hit is same as this one. It's because of the etcd version updated to v3.4. etcd v3.4.0 doc says
–enable-v2
Accept etcd V2 client requests
default: false
env variable: ETCD_ENABLE_V2
https://etcd.io/docs/v3.4.0/op-guide/configuration/
The default value for v2 protocol support is changed to false in version 3.4.
etcd v3.3.13 doc: https://etcd.io/docs/v3.3.13/op-guide/configuration/ says default is true
.
To use etcd protocol v2, we have to set ETCD_ENABLE_V2=true
env variable on etcd pods.
Another option would be to use v3 protocol. tooz seems to have etcd3 v3 driver. https://github.com/openstack/tooz/blob/master/tooz/drivers/etcd3.py#L118 A simple etcd URI config change would solve the issue. Actually I think this is better solution, but I haven't tested this.
I ran into a problem trying to set a key/value pair in the latest version. I get the following error iwht 0.50.0:
The st2api logs show this exception:
Reverting the chart to 0.41.0 makes the problem go away.