Closed d41k4n closed 2 years ago
Also other password with special characters may fail in the operator itself:
dynatrace-operator-594cd7d575-hxlzj dynatrace-operator dynatrace-operator-594cd7d575-hxlzj dynatrace-operator {"level":"info","ts":"2022-03-30T12:56:14.804Z","logger":"dtclient","msg":"could not parse proxy URL!"}
dynatrace-operator-594cd7d575-hxlzj dynatrace-operator {"level":"info","ts":"2022-03-30T12:56:14.813Z","logger":"dynakube-controller","msg":"problem with token detected","dynakube":"axa-ch-dev","token":"APIToken","msg":"error when querying token on secret dynatrace-prod-ats-openpaas-cc:dynakube-tokens-axa-ch-dev-h87gmtm76t: error making post request to dynatrace api: Post \"https://${instance}.live.dynatrace.com/api/v1/tokens/lookup\": proxyconnect tcp: dial tcp: lookup http on 10.96.0.10:53: no such host"}
dynatrace-operator-594cd7d575-hxlzj dynatrace-operator {"level":"info","ts":"2022-03-30T12:56:14.816Z","logger":"dynakube-controller","msg":"problem with token detected","dynakube":"axa-ch-dev","token":"PaaSToken","msg":"error when querying token on secret dynatrace-prod-ats-openpaas-cc:dynakube-tokens-axa-ch-dev-h87gmtm76t: error making post request to dynatrace api: Post \"https://${instance}.live.dynatrace.com/api/v1/tokens/lookup\": proxyconnect tcp: dial tcp: lookup http on 10.96.0.10:53: no such host"}
Some characters present in password: !*.-_+?
We can reliably reproduce erratic behavior using a proxy password such as "07I!?.+nV-G_Xv".
If the proxy URL secret is created containing above password in URL-encoded form (see below) then this will prevent the AG pods to connect to the API and the log will show HTTP status 407 errors (tested with capability "kubernetes-monitoring"):
2022-04-04 09:21:38 UTC SEVERE [<environmentID>] [<communication>, MessageBroker] Failed to send INITIAL_COLLECTOR_SETUP message (target-type=SERVER, target-id=-1004), uri=https://sg-eu-west-1-12-34-567-890-prod9-ireland.live.dynatrace.com/communication - CommunicationException: HTTP communication failed, status code=407, reason=Proxy Authentication Required
Secret created with:
oc -n dynatrace create secret generic myproxysecret --from-literal="proxy=http://someuser:07I%21%3F.%2BnV-G_Xv%2A@myproxy:8080"
Dynakube snippet:
apiVersion: dynatrace.com/v1beta1
kind: DynaKube
metadata:
name: dynatrace
namespace: dynatrace
spec:
...
activeGate:
capabilities:
- kubernetes-monitoring
image: ""
replicas: 1
apiUrl: https://environmentid.live.dynatrace.com/api
proxy:
valueFrom: myproxysecret
...
If the secret is created with the password in clear text then that will even prevent the operator from connecting via proxy and its log will contain entries such as:
{"level":"info","ts":"2022-04-04T06:31:45.783Z","logger":"dtclient","msg":"could not parse proxy URL!"}
Passing the password to the operator url-encoded is correct. I am currently investigating the issue with the AG pods
Perhaps it would make sense to URL-encode username and password here:
https://github.com/Dynatrace/dynatrace-operator/blob/v0.4.2/src/agproxysecret/secret.go#L97-L98 https://github.com/Dynatrace/dynatrace-operator/blob/v0.5.1/src/agproxysecret/secret.go#L97-L98
It seems there is an issue with how the ActiveGate handles the password passed to it. We are going to investigate this issue further.
A workaround was merged in https://github.com/Dynatrace/dynatrace-operator/pull/706 and will be released in a future version
We found that with 0.4.2 the operator was unable to connect to the API via proxy due to an error with parsing the proxy URL.
After some troubleshooting we believe we identified the reason to be special characters contained in the proxy password as part of the secret provided via spec.proxy.valueFrom in the DynaKube.
It seems we could work around it in some cases by passing the password URL-encoded during creation of the secret but not for all cases we tested...
We also observed cases where the operator was able to connect to the proxy successfully but not the containerized ActiveGates (we've seen 407's from the proxy in the log). Granted, this could have also be caused by a temporarily locked user caused by too many unsuccessful auth attempts but during some of our tests we had cases where the operator had no errors in its log while the AG's reported proxy issues....
We didn't get a chance to test with 0.5.0 yet but since I don't see any bugfixes mentioned in this regard I expect the same problem occurring there.
Curiously, is anybody else able to confirm this behavior on their end?