StackStorm / stackstorm-k8s

K8s Helm Chart that codifies StackStorm (aka "IFTTT for Ops" https://stackstorm.com/) Highly Availability fleet as a simple to use reproducible infrastructure-as-code app
https://helm.stackstorm.com/
Apache License 2.0
105 stars 107 forks source link

Can't set datastore key/value pairs in 0.50.0 #167

Closed lordpengwin closed 3 years ago

lordpengwin commented 3 years ago

I ran into a problem trying to set a key/value pair in the latest version. I get the following error iwht 0.50.0:

% st2 key set foo boo
ERROR: 400 Client Error: Bad Request
MESSAGE: Extra data: line 1 column 5 - line 2 column 1 (char 4 - 19) for url: http://stackstorm-st2api:9101/keys/foo

The st2api logs show this exception:

2021-01-04 19:22:15,825 INFO [-] abc5b97e-6ae8-4bf1-8303-50da40b79cfd - PUT /keys/foo with query={} (method=‘PUT’,path=’/keys/foo’,remote_addr=‘192.168.45.255’,query={},request_id=‘abc5b97e-6ae8-4bf1-8303-50da40b79cfd’)
2021-01-04 19:22:15,832 ERROR [-] Failed to call controller function “put” for operation “st2api.controllers.v1.keyvalue:key_value_pair_controller.put”: Extra data: line 1 column 5 - line 2 column 1 (char 4 - 19)
Traceback (most recent call last):
File “/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/router.py”, line 516, in call
resp = func(**kw)
File “/opt/stackstorm/st2/lib/python3.6/site-packages/st2api/controllers/v1/keyvalue.py”, line 292, in put
with self._coordinator.get_lock(lock_name):
File “/opt/stackstorm/st2/lib/python3.6/site-packages/tooz/locking.py”, line 52, in enter
acquired = self.acquire(*args, **kwargs)
File “/opt/stackstorm/st2/lib/python3.6/site-packages/tooz/drivers/etcd.py”, line 132, in acquire
“prevExist”: “false”})
File “/opt/stackstorm/st2/lib/python3.6/site-packages/tooz/drivers/etcd.py”, line 75, in put
return self.session.put(url, **kwargs).json()
File “/opt/stackstorm/st2/lib/python3.6/site-packages/requests/models.py”, line 898, in json
return complexjson.loads(self.text, **kwargs)
File “/opt/stackstorm/st2/lib/python3.6/site-packages/simplejson/init.py”, line 525, in loads
return _default_decoder.decode(s)
File “/opt/stackstorm/st2/lib/python3.6/site-packages/simplejson/decoder.py”, line 373, in decode
raise JSONDecodeError(“Extra data”, s, end, len(s))
simplejson.errors.JSONDecodeError: Extra data: line 1 column 5 - line 2 column 1 (char 4 - 19)
2021-01-04 19:22:15,833 INFO [-] abc5b97e-6ae8-4bf1-8303-50da40b79cfd - 400 84 8.372ms (method=‘PUT’,path=’/keys/foo’,remote_addr=‘192.168.45.255’,status=400,runtime=8.372,content_length=84,request_id=‘abc5b97e-6ae8-4bf1-8303-50da40b79cfd’)

Reverting the chart to 0.41.0 makes the problem go away.

arm4b commented 3 years ago

Thanks for the report! I could reproduce it. Confirming, it's a bug.

Based on /tooz/drivers/etcd.py occurence in the error log, wondering if switching to other etcd version via Helm charts in recent upgrade (https://github.com/StackStorm/stackstorm-ha/pull/163) is an issue here or not.

arm4b commented 3 years ago

Indeed new coordination backend is an issue.

Installing chart with coordination disabled:

helm upgrade --set etcd.enabled=false test stackstorm/stackstorm-ha

st2api:

2021-01-04 23:53:41,218 INFO [-] Creating st2api: StackStorm v3.4dev as OpenAPI app.
2021-01-04 23:53:47,298 WARNING [-] Coordination backend is not configured. Code paths which use coordination service will use best effort approach and race conditions are possible.

st2client:

root@test-st2client-69bcb768b-5wts4:/opt/stackstorm# st2 key set foo bar
+------------------+--------------+
| Property         | Value        |
+------------------+--------------+
| name             | foo          |
| value            | bar          |
| scope            | st2kv.system |
| expire_timestamp |              |
+------------------+--------------+

We'll need to find another etcd version or use another coordination backend like Redis per https://github.com/StackStorm/stackstorm-ha/issues/94#issuecomment-608394681


For everyone affected, temporary workaround is disabling the etcd coordination backend in helm values and provide your own external coordination backend URL via st2.conf section of stackstorm-ha Helm chart values. See https://github.com/StackStorm/stackstorm-ha/blob/e373c7280ffbd720502a45eb97bdf9ccd5391ba5/values.yaml#L517-L520

arm4b commented 3 years ago

I'm having time constraints on fixing it really quick, but if anyone wants to help with this ASAP, we'll need to switch to using https://github.com/bitnami/charts/tree/master/bitnami/redis instead of etcd for external dependency Chart.

2 places will need an update: 1) Helm Chart Dependencies + Values to use bitnami/redis 3rd party Chart 3) st2.conf connection string for [coordination] to use Redis instead of etcd https://github.com/StackStorm/stackstorm-ha/blob/e373c7280ffbd720502a45eb97bdf9ccd5391ba5/templates/configmaps_st2-conf.yaml#L23-L26

arms11 commented 3 years ago

In the recent past I had limited success in using Redis helm chart based deployment from bitnami when i did it outside of the ST2 chart. Limited because I was not able to make it work with Sentinel which is recommended for HA. This could very well be an issue on my side, though. will be curious to play with it and see if this time it behaves properly now.

arm4b commented 3 years ago

This example https://github.com/openstack/tooz/blob/master/tooz/drivers/redis.py#L147-L155 might give some hints about proper Sentinel/Redis connection string.

wantdrink commented 3 years ago

@arms11 I'm using this and it works well:

redis://:<pass_word>@<sentinel1_fqdn>:26379?sentinel=<master_name>&sentinel_fallback=<sentinel2_fqdn>:26379&sentinel_fallback=<sentinel3_fqdn>:26379

arms11 commented 3 years ago

got it. Thanks. will follow that.

arms11 commented 3 years ago

I have a bit of bad news on this front. Over the weekend I tried to connect ST2 with in-place redis cluster with sentinel enabled. Following this link https://docs.bitnami.com/tutorials/deploy-redis-sentinel-production-cluster/ (somewhat obsolete after a year!!) I continuously keep getting No master found for <master-name> no matter what I give in the format described above for connection string. I tried redis, mymaster, master0, even <master pod name> and <master pod IP>. But no avail. I have a dilemma:

  1. Sentinel is recommended but for some reason setup is not helping for tooz.
  2. Without Sentinel, explicit master/slave arch does not have necessary documentation on how to form connection string. In this arch, we have to rely on Kubernetes Controller Manager to spin off the master pod. Single master can be used but then it does not give HA.

Found this on-line. https://github.com/andymccurdy/redis-py/issues/1388 but even this does not seem to help.

arm4b commented 3 years ago

@arms11 If you could compose a draft PR to add the new Redis chart dependency, - we'll help further. I guess @wantdrink had a working setup with Redis + Sentinel.

wantdrink commented 3 years ago

@armab Yes I'm using stable/redis-ha 4.9.3 with sentinel and the coordinator URL is posted above. @arms11 The bitnami chart you are using are set as non-cluster with sentinel right? Could you provide your redis URL configured in stackstorm?

arms11 commented 3 years ago

@wantdrink @armab I have the draft PR: https://github.com/StackStorm/stackstorm-ha/pull/169

Connection String: url = redis://st2ha-redis-node-0.st2ha-redis-headless:26379?sentinel=mymaster&sentinel_fallback=st2ha-redis-node-1.st2ha-redis-headless:26379&sentinel_fallback=st2ha-redis-node-2.st2ha-redis-headless:26379

Yes, i am using https://github.com/bitnami/charts/tree/master/bitnami/redis overriding their values-production.yaml

I do see lot of issues (on-hold) in the chart repo, but I am not an expert of redis nor i have used stable repo chart to say otherwise.

Please provide guidance on how to proceed.

arms11 commented 3 years ago

@wantdrink - I did not understand non-cluster with sentinel part. If you check the helm chart url, the Values.cluster.enabled and Values.sentinel .enabled both are true in my deployment.

wantdrink commented 3 years ago

Understood. I guess you can try cluster=false but sentinel = true. That's a redis HA with sentinel but it's not a cluster.

tarcusx commented 3 years ago

Just FYI, I was able to fix the issue without changing the coordination engine from etcd to redis by adding ETCD_ENABLE_V2="true" env variable on etcd helm.

arms11 commented 3 years ago

@tarcusx - that's nice to know. Good to have options! Any documentation around what the envt variable was introduced for? was there an actual bug on ETCD side?

tarcusx commented 3 years ago

@arms11 Note that I'm using my own etcd helm, not the one specified by this stackstorm-ha chart.yaml. But I think the issue I hit is same as this one. It's because of the etcd version updated to v3.4. etcd v3.4.0 doc says

–enable-v2
Accept etcd V2 client requests
default: false
env variable: ETCD_ENABLE_V2

https://etcd.io/docs/v3.4.0/op-guide/configuration/ The default value for v2 protocol support is changed to false in version 3.4. etcd v3.3.13 doc: https://etcd.io/docs/v3.3.13/op-guide/configuration/ says default is true.

To use etcd protocol v2, we have to set ETCD_ENABLE_V2=true env variable on etcd pods.

Another option would be to use v3 protocol. tooz seems to have etcd3 v3 driver. https://github.com/openstack/tooz/blob/master/tooz/drivers/etcd3.py#L118 A simple etcd URI config change would solve the issue. Actually I think this is better solution, but I haven't tested this.