Closed shulin-sq closed 8 months ago
@solmonk interesting. have not run into the same issue with vpc association but seems like the same issue. I will update the PR.
@mikhail-aws @zijun726911 it seems the presubmit hook is stuck, any advice on how to fix it?
presubmit passed, will merge it
@shulin-sq Do you need our side to do a new controller version release( new helm chart) to include this fix? Or you build your image by yourself in your private image repo?
@zijun726911 We already created an internal build that includes this change. No rush on a release version.
I had to redact some fields from a comment I left earlier. Here is the comment:
some notes:
I was unable to reproduce the VPC problem. I do have my VPC association set up in another account, so I'm unsure why I'm not running into the same issue. Specifically my scenario is: account A owns the sn, account A shares sn to account B, account B does the vpc association. account A does not have access to the vpc association. The controller has iam permissions for account A. I tried checking for the error code but unfortunately it looks like getTags returns 404 NotFoundException which is different from ResourceNotFoundException which is already a static string in the vpclattice go sdk. For now I kept the err check and added a todo message. Alternatives would be to check against the status code 404 (but that's not exposed as part of awserr.Error) or to check for NotFoundException but temporarily define the string somewhere else. Please let me know what your preference is. I was able to test the service association in my staging environment and saw the following error messages:
before while debugging (when inspecting the awserr.Error)
{"level":"error","ts":"2024-02-26T01:31:06.420Z","logger":"controller.route","caller":"lattice/service_manager.go:227","msg":"shulin _was_here NotFoundException, %!w(awserr.requestError=&{0xc01324f640 404 5cd0afde-496b-4343-abaa-e239df73eea7 []}), %!w(awserr.requ estError=&{0xc01324f640 404 5cd0afde-496b-4343-abaa-e239df73eea7 []})"} after
{"level":"warn","ts":"2024-02-26T01:47:25.568Z","logger":"controller.route","caller":"lattice/service_manager.go:232","msg":"skippin g update associations service: REDACTED, association: REDACTED, error: NotFoundException: Resource was not found\n\tstatus code: 404, request id: 086c ba48-4421-4eda-bcae-c543acd81a85"}
What type of PR is this? bug
Which issue does this PR fix: In the scenario where
Account A can be in a situation where does not have permissions to read the tags of that association
Therefore this results in an error during route reconciliation.
What does this PR do / Why do we need it:
if the get tags of a service network association request fails, instead of erroring, skip deleting the association and log.
If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:
see above scenario
Testing done on this change:
tested the fix in a staging environment (on top of most recent master) but open to suggestions on how test it via unit tests (looks like current tests don't touch on this scenario).
sample error message:
Automation added to e2e:
Will this PR introduce any new dependencies?: no
Will this break upgrades or downgrades. Has updating a running cluster been tested?: no, yes
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.