AthenZ / k8s-athenz-sia

Apache License 2.0
1 stars 7 forks source link

Error handling improvement #93

Closed WindzCUHK closed 5 months ago

WindzCUHK commented 6 months ago

Description

improve error handling

Type of change

Flags


Checklist

Checklist for maintainer

WindzCUHK commented 6 months ago

changes

  1. TARGET_DOMAIN_ROLES pre-parsing: ab7d938
    1. "" => continue as empty array
      INFO[2024-03-05T17:32:53+09:00] Booting up with args: ... TargetDomainRoles:[] ...
      ...
    2. non-empty string && ALL elements are invalid => fatal
      WARNING[2024-03-05T17:25:55+09:00] Invalid TARGET_DOMAIN_ROLES/target-domain-roles ["user.windz.provider:invalid1,user.windz.provider:invalid2"], warnings:
      failed to read target domain and role from element: ["user.windz.provider:invalid1"], err: invalid role name: 'user.windz.provider:invalid1', expected format {domain}:role.{role}
      failed to read target domain and role from element: ["user.windz.provider:invalid2"], err: invalid role name: 'user.windz.provider:invalid2', expected format {domain}:role.{role}
      FATAL[2024-03-05T17:25:55+09:00] Invalid TARGET_DOMAIN_ROLES ["user.windz.provider:invalid1,user.windz.provider:invalid2"], NO valid domain-role pairs
    3. non-empty string && SOME elements are invalid => continue with valid values ONLY
      WARNING[2024-03-05T17:32:53+09:00] Invalid TARGET_DOMAIN_ROLES/target-domain-roles ["user.windz.provider:role.dummy-role,user.windz.provider:invalidinvalidinvalidinvalid"], warnings:
      failed to read target domain and role from element: ["user.windz.provider:invalidinvalidinvalidinvalid"], err: invalid role name: 'user.windz.provider:invalidinvalidinvalidinvalid', expected format {domain}:role.{role}
      ...
      INFO[2024-03-05T17:32:53+09:00] Booting up with args: ... TargetDomainRoles:[user.windz.provider:role.dummy-role] ...
      ...
  2. mode=refresh, print error log when ANY preset certificates failed: bb1f698
    1. mode=refresh, fetch certificates fails => continue
      INFO[2024-03-06T02:09:31+09:00] Attempting to get x509 role certs from identity provider: targets[[user.windz.provider:role.dummy-role user.windz.provider:role.self]]... 
      WARNING[2024-03-06T02:09:31+09:00] Error while requesting x509 role certificate to identity provider: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
      ERROR[2024-03-06T02:09:31+09:00] Failed to refresh certificates: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider. Retrying in 7.084644008s
      ...
      INFO[2024-03-06T02:09:38+09:00] Attempting to get x509 role certs from identity provider: targets[[user.windz.provider:role.dummy-role user.windz.provider:role.self]]... 
      WARNING[2024-03-06T02:09:39+09:00] Error while requesting x509 role certificate to identity provider: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
      ERROR[2024-03-06T02:09:39+09:00] Failed to get initial certificate after multiple retries for refresh mode: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider, will try to continue startup process...
      ...
      INFO[2024-03-06T02:15:03+09:00] Starting metrics exporter[:9999]
    2. mode=init, fetch certificates fails => fatal
      INFO[2024-03-06T01:56:47+09:00] Attempting to get x509 role certs from identity provider: targets[[user.windz.provider:role.dummy-role user.windz.provider:role.self]]... 
      WARNING[2024-03-06T01:56:47+09:00] Error while requesting x509 role certificate to identity provider: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
      ERROR[2024-03-06T01:56:47+09:00] Failed to refresh certificates: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider. Retrying in 7.007149192s 
      ...
      INFO[2024-03-06T01:56:54+09:00] Attempting to get x509 role certs from identity provider: targets[[user.windz.provider:role.dummy-role user.windz.provider:role.self]]... 
      WARNING[2024-03-06T01:56:54+09:00] Error while requesting x509 role certificate to identity provider: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
      ERROR[2024-03-06T01:56:54+09:00] Failed to get initial certificate after multiple retries for init mode: PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
      FATAL[2024-03-06T01:56:54+09:00] PostRoleCertificateRequest failed for principal[user.windz.instance.mac] to get Role Subject CommonName[user.windz.provider:role.self], err: 403 postrolecertificaterequestext: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider 
  3. fatal when ListenAndServe() fails: c2230f4
    1. token server
      FATAL[2024-03-06T02:45:33+09:00] Failed to start token provider: listen tcp :8880: bind: address already in use
    2. metrics server
      FATAL[2024-03-06T02:47:38+09:00] Failed to start metrics exporter: listen tcp :9999: bind: address already in use
    3. health server
      FATAL[2024-03-06T02:48:21+09:00] Failed to start health check server: listen tcp :9999: bind: address already in use
  4. error when Shutdown() fails, but fatal when graceful shutdown fails: 454812f
    1. token server shutdown error => fatal
      FATAL[2024-03-06T03:01:32+09:00] Failed to shutdown token provider: test error
    2. metrics server shutdown error => error log
      ERROR[2024-03-06T03:02:57+09:00] Failed to shutdown metrics exporter: test error 
      INFO[2024-03-06T03:02:57+09:00] Shutdown completed!
    3. health check server shutdown error => error log
      ERROR[2024-03-06T03:02:17+09:00] Failed to shutdown health check server: test error 
      INFO[2024-03-06T03:02:17+09:00] Shutdown completed!
  5. force shutdown health check server: 200adae
  6. health check server panic handler and error log: 436a09a
    1. on error => error log
      ERROR[2024-03-06T03:16:19+09:00] Failed to write health check server response: test Error
      1. on panic => error log
        ERROR[2024-03-06T03:17:48+09:00] http: panic serving 127.0.0.1:54993: test panic
        goroutine 16 [running]:
        ...
WindzCUHK commented 6 months ago

extra changes

  1. use standard error wrap: ec2fcf3
    1. golang suggested way: https://pkg.go.dev/errors
  2. remove rand.Seed(): bb1f698
  3. parse IP in config.go: 97b2ca4
    1. invalid IP => fatal
      FATAL[2024-03-06T02:37:44+09:00] Invalid POD_IP ["127.x.x.x"]
    2. empty IP => use default 127.0.0.1
      INFO[2024-03-06T02:38:01+09:00] Booting up with args: ... PodIP:127.0.0.1 ... rawPodIP:127.0.0.1 ...
WindzCUHK commented 6 months ago

token server log level 8e6ed90

  1. sidecar API, invalid JSON body => warning
    
    curl -sSi http://127.0.0.1:8880/accesstoken --data '{"domain":"user.windz.provider","role":"dummy-role}'
    # WARNING[2024-03-06T03:44:46+09:00] Error: Client error: unexpected EOF requestID[2129939f-3983-47bb-a073-7211bc443d3f]  Internal Server Error 

curl -sSi http://127.0.0.1:8880/roletoken --data '{"domain":"user.windz.provider","role":"dummy-role}'

WARNING[2024-03-06T03:44:46+09:00] Error: Client error: unexpected EOF requestID[c50860e4-4744-4e01-94b2-d68f088d5f23] Internal Server Error

1. sidecar API, empty domain => warning
```shell
curl -sSi http://127.0.0.1:8880/accesstoken --data '{"domain":"","role":"dummy-role"}'
# WARNING[2024-03-06T03:45:41+09:00] Error: Client error: Invalid value: domain[], role[dummy-role] requestID[ac66ab08-12b9-4361-8a7b-0919b9c2af65]       Internal Server Error 

curl -sSi http://127.0.0.1:8880/roletoken --data '{"domain":"","role":"dummy-role"}'
# WARNING[2024-03-06T03:45:41+09:00] Error: Client error: Invalid value: domain[], role[dummy-role] requestID[3e37c56e-6cd6-44ad-b254-ce616da47dbd]       Internal Server Error 
  1. sidecar API, server error => error
    curl -sSi http://127.0.0.1:8880/roletoken --data '{"domain":"user.windz.provider","role":"self"}'
    # ERROR[2024-03-06T03:46:32+09:00] Error: PostAccessTokenRequest failed for target [{user.windz.provider:role.self,,0,0}], err: 403 postaccesstokenrequest: principal user.windz.instance.mac is not included in the requested role(s) in domain user.windz.provider requestID[d9bfb5bd-dd35-4304-bdb5-0d1ebbf8f1ea]        Internal Server Error
  2. header API, invalid header => warning
    curl --header "X-Athenz-Role: self" -sSi http://127.0.0.1:8880/token
    # WARNING[2024-03-06T03:54:44+09:00] Client error: while handling request with: X-Athenz-Domain[] X-Athenz-Role[self], error[http headers not set: X-Athenz-Domain[] X-Athenz-Role[self].] 
  3. header API, cache miss => warning
    curl --header "X-Athenz-Domain: user.windz.provider" --header "X-Athenz-Role: self" -sSi http://127.0.0.1:8880/token
    # WARNING[2024-03-06T03:54:07+09:00] Client error: while handling request with: X-Athenz-Domain[user.windz.provider] X-Athenz-Role[self], error[domain[user.windz.provider] role[self] was not found in cache.]