cloudfoundry / cli

The official command line client for Cloud Foundry
https://docs.cloudfoundry.org/cf-cli
Apache License 2.0
1.75k stars 927 forks source link

CF CLI should handle 502's on GETs, especially when interacting with GCP #1335

Open mjj209 opened 6 years ago

mjj209 commented 6 years ago

Command

Instance broker-registrar/8204884a-dd89-407a-9e1c-4e797ad998e7
Exit Code 1
Stdout CF_API_URL=https://api.sys.lighttaupe.cf-app.com
CF_SKIP_SSL_VALIDATION=true
CF_ADMIN_USERNAME=admin
BROKER_NAME=p-mysql
BROKER_URL=https://p-mysql.sys.lighttaupe.cf-app.com:443
BROKER_USERNAME=blahblahblah Setting api endpoint to https://api.sys.lighttaupe.cf-app.com...
OK

       api endpoint:   https://api.sys.lighttaupe.cf-app.com  
       api version:    2.98.0  
       Not logged in. Use 'cf login' to log in.  
       API endpoint: https://api.sys.lighttaupe.cf-app.com  
       Authenticating...  
       OK  
       Use 'cf target' to view or set your target org and space.  
       Service broker does not exist - creating broker  
       Creating service broker p-mysql as admin...  
       OK  
       Enabling access of plan 100mb for service p-mysql as admin...  
       FAILED  
       Server error, status code: 502, error code: 0, message:  

What occurred

502, failed_to_connect_to_backend from the GCP load balancer

What you expected to occur

I would expect the CF CLI to re-try, as this issue https://github.com/cloudfoundry/cli/issues/1230 shows it was fixed. We talked to Anand, and he said that not all API calls had been refactored, though.

CLI Version

unsure

CC API Endpoint Version

unsure

Any other relevant information

There are many engineering teams that are using GCP as the platform for testing. GCP is offering 99.95% reliability for their load balancer requests. This means that a certain number will always fail. Having re-try logic at the lowest levels seems optimal, as most teams will write tests that assume all API calls will succeed, and it a single API calls fails, the whole test fails. Many tests take up to 1 hour to complete, and you have to start over if it fails at any point.

I think we've found one call that hasn't been refactored, but I also believe the correct number of re-try's is 3 for 502's where the status is not response_sent_by_backend. If we re-try three times, then our success rate will improve by a power of 3.

Feel free to ask us questions at #toolsmiths in slack, as we've got 502 fever, and a ton more data if you're interested ;-)

Thanks, Mike J PM@Toolsmiths

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/155668030

The labels on this github issue will be updated when the story is started.

ljfranklin commented 6 years ago

To pile on to this issue, could cf curl use the same retry logic as the other commands?

XenoPhex commented 6 years ago

Notes: The command in question is enable-service-access.

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed.