Closed k-wall closed 5 years ago
I now understand this error. The upgrader needs to add new environment variables and remove others from the broker statefulset in order to perform the update. The upgrade is done by sending a PATCH. The patch sent is an RFC 6902.
~~I believe client libraries used by EnMasse are incorrectly calculate the RFC 6902 (see attachment). ~~ The problematic part of the document is the following.
{
"op":"remove",
"path":"/spec/template/spec/initContainers/0/env/19",
"value":{
"name":"AUTHENTICATION_SERVICE_OAUTH_URL",
"value":"${AUTHENTICATION_SERVICE_OAUTH_URL}"
}
},
{
"op":"add",
"path":"/spec/template/spec/initContainers/0/env/20",
"value":{
"name":"HOME",
"value":"/var/run/artemis/split-1/"
}
},
The intent is to remove AUTHENTICATION_SERVICE_OAUTH_URL and add HOME, but it is expressed in an illegal way - causing an array out of bounds condition. Reversing the order of the two operations allows the patch to work.
rfc6902_patch_statefulset.json.txt
curl -k -v -XPATCH -H "User-Agent: oc/v1.11.0+d4cacc0 (linux/amd64) kubernetes/d4cacc0" -H "Accept: application/json" -H "Content-Type: application/json-patch+json" -H "Authorization: Bearer As8Qt7H08rOtT1WDvv-VlZ6agJQwxSgdDVnBfR3S1BQ" 'https://127.0.0.1:8443/apis/apps/v1/namespaces/amq-online-infra/statefulsets/broker-wf8m730ie4-sapa' --data-binary "@/tmp/data.json" {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unable to access invalid index: 20","code":500}
https://github.com/flipkart-incubator/zjsonpatch is an implementation of https://tools.ietf.org/html/rfc6902#section-4.1 In the section regarding Add it says:
An element to add to an existing array - whereupon the supplied
value is added to the array at the indicated location. Any
elements at or above the specified index are shifted one position
to the right. The specified index MUST NOT be greater than the
number of elements in the array. If the "-" character is used to
index the end of the array (see [RFC6901]), this has the effect of
appending the value to the array.
HOME is getting added to the 21st spot. Which makes the index 20 correct in the client side patch.
Still investigating the server side, to see why it isn't applying the patch correctly.
{"op":"move","from":"/spec/initContainers/0/env/14","path":"/spec/initContainers/0/env/12"} could be the issue.
@vbusch Your suspicion about move being a factor is correct. I believe the issue in a flawed implementation of move on the OpenShift server side. I think anytime a PATCH include a 'move' the results could be incorrect. We may or may not see an exception. I've raised https://github.com/openshift/origin/issues/23206
The following exception was seen once during the upgrade of an 0.26 install to 0.28.1 with an addressspace with a numnber of address types. I haven't been successful in reproducing the issue a second time. A loop formed with the same upgrade being attempted every few seconds by the controller.
The issue occurs with a
standard-unlimited
address space with a singlestandard-small-queue
address.