gardener-attic / gardenctl

Command-line client for the Gardener.
Other
56 stars 42 forks source link

AWS SSH - fails due to existing resource #492

Closed danielfoehrKn closed 3 years ago

danielfoehrKn commented 3 years ago

Describe the bug Gardenctl fails to ssh into AWS node when one of the AWS resources already exist.

This happens e.g. when there is already an operator ssh-ing into the same node / or you want to use ssh with two terminals.

Downloaded id_rsa key
Check Public IP

(1/4) Fetching data from target shoot cluster
Data fetched from target shoot cluster.

(2/4) Setting up bastion host security group
Security Group exists sg-0f689115eb7e11cc2 skipping creation.

(3/4) Creating bastion host and node host security group
Bastion Host exists, skipping creation.
2020/12/09 14:24:34 AWS CLI failed with
An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 10.250.96.39/32, TCP, from port: 22, to port: 22, ALLOW" already exists

To Reproduce Steps to reproduce the behavior:

  1. Which target was set 'gardenctl get target'
  2. Which command was entered [e.g. 'gardenctl show vpn-seed']
  3. What was the output of the command

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Gardenctl Version (please complete the following information):

Additional context Add any other context about the problem here.

neo-liang-sap commented 3 years ago

/assign thanks for @danielfoehrKn for reporting this, i will fix it

neo-liang-sap commented 3 years ago

Hi @danielfoehrKn , i proposed PR https://github.com/gardener/gardenctl/pull/494 to fix this problem. Basically this problem is sometimes gardenctl ssh exit unexpectedly so the ingress rule exist in security group so when gardenctl ssh next time it is impossible to add it. In my fix i get all ingress rules from host security group and put them in a array like ["22","1.2.3.4/32","443","2.3.4.5/32","3.4.5.6"] (note : there are some cases a ingress rule doesn't have port specified, only ip ranges specified) then i check if a item in this array is "22" and next item matches "local_ip/32" CIDR, if found, it means we have ingress rules already exist so skip creation.

I tested in my AWS cluster in following steps 1) gardenctl ssh node_name and kill process after ssh success, so the cleanup function didn't run, the ingress rule already exist 2) rerun gardenctl ssh node_name , success I hope this works in your side and it would be much appreciated if you can provide your cluster in description for me to verify again.

Thanks! -Neo

tedteng commented 3 years ago

494 which I use today is seem not working, another issue when port check return exit status 1, it seems also break the cleanup function, which make resource leakage

(3/4) Creating bastion host and node host security group
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Bastion Host exists, skipping creation.
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains auth provider configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
2020/12/17 11:44:42 AWS CLI failed with
An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 10.250.96.39/32, TCP, from port: 22, to port: 22, ALLOW" already exists

exit status 255
exit status 1

then I switch to #494

ttt@W-R90PNMJE:~/work/gardenctl$ git checkout 494
Switched to branch '494'
ttt@W-R90PNMJE:~/work/gardenctl$ gg ssh ip-10-250-0-61.eu-central-1.compute.internal
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains auth provider configurations that could contain malicious code. Please only continue if you have verified it to be uncritical

Warning:
Be aware that you are entering an untrusted environment!
Do not enter credentials or sensitive data within the ssh session that cluster owners should not have access to.

Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Downloaded id_rsa key
Check Public IP

(1/4) Fetching data from target shoot cluster
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Data fetched from target shoot cluster.

(2/4) Setting up bastion host security group
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Security Group exists sg-0ca746fd63e9560b3 skipping creation.

(3/4) Creating bastion host and node host security group
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Bastion Host exists, skipping creation.
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
Kubeconfig under path /home/ttt/.garden/cache/canary/projects/hc-dev/fire-1-1-haas/kubeconfig.yaml contains exec configurations that could contain malicious code. Please only continue if you have verified it to be uncritical
SSH Port already opened on Node
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
waiting for 10 seconds to retry
2020/12/17 11:57:39 IP 3.122.94.82 port 22 is not reachable
exit status 1
ttt@W-R90PNMJE:~/work/gardenctl$                               
neo-liang-sap commented 3 years ago

1) for https://github.com/gardener/gardenctl/pull/494 not working, @tedteng could you please provide your gardenctl get target so i can use the cluster for debugging? this applies to such scenario in future 2) for ip port check breaks cleanup function, currently we have logic to check when next time gardenctl ssh if Bastian machine exist, we will not create it ; if we have SG there, we will not create it. In your case you could see logs like Security Group exists sg-0ca746fd63e9560b3 skipping creation. and Bastion Host exists, skipping creation. Off course i agree with you it is resource leakage but it's out of this issue scope, please open a new issue to track it

tedteng commented 3 years ago
  1. The leakage resource I already removed for canary/projects/hc-dev/fire-1-1-haas because it may blocked garden work flow. so you may not able to reproduce from the user cluster now. maybe try next time

  2. sure, I will open the new ticket. as recall it used to invoke cleanup function when port not reachable instead of exit directly which may make resource leakage