cloudfoundry / nodejs-buildpack

Cloud Foundry buildpack for Node.js
http://docs.cloudfoundry.org/buildpacks/
Apache License 2.0
169 stars 382 forks source link

Supply step do not retry when error is connection reset by peer #626

Open sleungcy opened 1 year ago

sleungcy commented 1 year ago

What version of Cloud Foundry and CF CLI are you using? (i.e. What is the output of running cf curl /v2/info && cf version?

{
...
   "version": 0,
   "min_cli_version": null,
   "min_recommended_cli_version": null,
   "api_version": "2.203.0",
   "osbapi_version": "2.15",
}
cf version 7.5.0+0ad1d63.2022-06-04

What version of the buildpack you are using?

We have multiple applications, the below versions are used currently.
Version: 1.8.5, 1.8.7 and 1.8.12

If you were attempting to accomplish a task, what was it you were attempting to do?

Package staging randomly fails at supply/finalize step when downloading Go without attempting any retry.

What did you expect to happen?

The curl command already has retry mechanism, and it should retry if the attempts encounter any network connection problems.

What was the actual behavior?

If the error is "connection reset by peer", the curl command do not retry and immediately return error

see https://stackoverflow.com/questions/42873285/curl-retry-mechanism

Sample log output from running the same curl command

   2023-07-18T18:17:02.90+0000 [APP/TASK/eab4dfa6/0] ERR % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
   2023-07-18T18:17:02.90+0000 [APP/TASK/eab4dfa6/0] ERR Dload  Upload   Total   Spent    Left  Speed
   2023-07-18T18:17:03.07+0000 [APP/TASK/eab4dfa6/0] ERR 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
   2023-07-18T18:17:03.74+0000 [APP/TASK/eab4dfa6/0] ERR 0  151M    0  142k    0     0   812k      0  0:03:10 --:--:--  0:03:10  809k
   2023-07-18T18:17:03.75+0000 [APP/TASK/eab4dfa6/0] ERR 33  151M   33 50.1M    0     0  59.3M      0  0:00:02 --:--:--  0:00:02 59.3M
   2023-07-18T18:17:03.75+0000 [APP/TASK/eab4dfa6/0] ERR curl: (56) OpenSSL SSL_read: Connection reset by peer, errno 104
   2023-07-18T18:17:03.75+0000 [APP/TASK/eab4dfa6/0] OUT Exit: 56, Elapsed: 1

To reproduce, create a sample app(doesn't matter what app), then create a run-task that perpetually download the go binary

cf run-task app-1 --command 'while true; do start_time=$(date +%s); curl https://buildpacks.cloudfoundry.org/dependencies/go/go_1.19_linux_x64_cflinuxfs3_7e231ea5.tgz --location --retry 15 --retry-delay 2 --output /tmp/go.tgz; echo -n "Exit: $?, "; end_time=$(date +%s); elapsed=$(( end_time - start_time )); echo "Elapsed: $elapsed"; sleep 10; done'

Watch the app log output for errors.

Can you provide a sample app?

N/A

Please confirm where necessary:

sleungcy commented 1 year ago

One quick way to fix this is to use --retry-all-errors instead of --retry https://stackoverflow.com/questions/10285700/curl-error-recv-failure-connection-reset-by-peer-php-curl#:~:text=One%20way%20we%20can%20avoid,number%20of%20worker_connections%20in%20nginx.

steinfurt commented 9 months ago

Note that --retry-all-errors only exists since curl v7.71.0 changelog. Therefore, it cannot be used in cflinuxfs3, but it should be available in cflinuxfs4.

ivanovac commented 9 months ago

A fix that would address this behaviour for cflinuxfs4 stack is proposed: PR Add option --retry-all-errors when downloading go version

radoslav-tomov commented 9 months ago

@ivanovac implemented a small go server, which simulates the "connection reset by peer" by sending RST to the client. With this we were able to confirm, that those types of transient TCP issues are ignored by the curl --retry mechanism and can be handled only by --retry-all-errors.