babashka / babashka.curl

A This library is mostly replaced by https://github.com/babashka/http-client
Eclipse Public License 1.0
121 stars 9 forks source link

Cannot retry when using `:as :stream` option #38

Closed bhb closed 2 years ago

bhb commented 3 years ago
(require '[babashka.curl :as curl])

;; For context, this takes around 8s and throws error with 500 status, as expected. 
;; This is just an example of how it works normally.
(curl/get "https://httpstat.us/500" {:raw-args ["--retry" "3"] :debug true})

;; #### Repro of bug ######
;; Actual: Takes around 300ms and throws error with 500 status
;; Expected: Should retry 3 times, which would take about 8s total
(curl/get "https://httpstat.us/500" {:as :stream :raw-args ["--retry" "3"] :debug true})

;; For comparison
;; curl -N --retry 3 https://httpstat.us/500
;; retries as expected
;; (-N is the option added here
;; https://github.com/babashka/babashka.curl/blob/e1f33ffa22728553bf2242db5e2a4ca349fab04b/src/babashka/curl.clj#L126 )    
borkdude commented 3 years ago

Thanks, I'll have a look soon. In the newest bb we now also support the JDK 11 Http Client. And otherwise you could shell out to curl directly as a workaround.

borkdude commented 3 years ago

@bhb I can't repro this. I added the options :silent false and :err :inherit for debugging. With those I get:

user=> (require '[babashka.curl :as curl])
nil
user=> (curl/get "https://httpstat.us/500" {:err :inherit :silent false :raw-args ["--retry" "3"]})
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    25  100    25    0     0     53      0 --:--:-- --:--:-- --:--:--    53
Warning: Transient problem: HTTP error Will retry in 1 seconds. 3 retries
Warning: left.
100    25  100    25    0     0    110      0 --:--:-- --:--:-- --:--:--   110
Warning: Transient problem: HTTP error Will retry in 2 seconds. 2 retries
Warning: left.
100    25  100    25    0     0     66      0 --:--:-- --:--:-- --:--:--    66
Warning: Transient problem: HTTP error Will retry in 4 seconds. 1 retries
Warning: left.
100    25  100    25    0     0    150      0 --:--:-- --:--:-- --:--:--   150
Execution error (ExceptionInfo) at babashka.curl/request (curl.clj:237).
babashka.curl: status 500
borkdude commented 3 years ago

@bhb Could you try the above too to see if the same happens for you, with the recent commit on master?

bhb commented 3 years ago

@borkdude What do you see if you add :as :stream to the args above?

Could you try the above too to see if the same happens for you, with the recent commit on master?

If I adapt your code above to add :as :stream, I get the following (apologies if I'm not doing the right thing to use the latest version, I think the following should work)

First, I tried the version packaged in Babashka:

;; bb --version
;; babashka v0.6.1

(require '[babashka.curl :as curl])
(curl/get "https://httpstat.us/500" {:as :stream :err :inherit :silent false :raw-args ["--retry" "3"]})

Then I tried to include the latest babashka.curl

(require '[babashka.deps :as deps])
(deps/add-deps '{:deps {io.github.babashka/babashka.curl {:git/sha "75389c5ae0a084b33529190d654b71e021239fa4"}}})
(require '[babashka.curl :as curl])
(curl/get "https://httpstat.us/500" {:as :stream :raw-args ["--retry" "3"] :debug true})

In both cases, I get a 500 immediately and I don't see the output you posted above.

borkdude commented 3 years ago

@bhb I see. I think the problem was that bb.curl throws as soon as it discovers that the status code is 500-ish. Can you try your request with :throw false? I suspect this will fix your issue.

bhb commented 3 years ago

@borkdude In my tests, using :throw false will return the map with the status code, but it still happens immediately i.e. without retrying.

To summarize, here are my results

(require '[babashka.deps :as deps])
(deps/add-deps '{:deps {io.github.babashka/babashka.curl {:git/sha "75389c5ae0a084b33529190d654b71e021239fa4"}}})
(require '[babashka.curl :as curl])

;; With no ':as :stream', this will retry 3 times, then throw (as expected)
(curl/get "https://httpstat.us/500" {:err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

;; With no ':as :stream' and ':throw false' this will retry 3 times, then return map that contains status code (as expected)
(curl/get "https://httpstat.us/500" {:throw false
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

;;; Unexpected behavior below

;; With ':as :stream', this immediately throws a 500 without retry
(curl/get "https://httpstat.us/500" {:as :stream
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})

;; With ':as :stream' and ':throw false', this immmediately returns a map with a status code (without retrying)
(curl/get "https://httpstat.us/500" {:as :stream
                                     :throw false
                                     :err :inherit
                                     :silent false
                                     :raw-args ["--retry" "3"]})
borkdude commented 3 years ago

@bhb

I would say the but-last expression is unexpected but an edge case: the :status key is filled in from the first http status code it finds in the headers and when this is 500-ish, it will throw by default. You can suppress this with :throw false. Let's assume that you'll have to use this for your use case.

Then with the last expression. It returns immediately, because this is what :stream is for: it won't wait until the process is finished, e.g. for consuming a stream of events.

If you def your last expression like (def response ...) and then do (slurp (:body resp)) you'll see this: "500 Internal Server Error500 Internal Server Error500 Internal Server Error500 Internal Server Error"

which I think indicates that there were three retries.

But perhaps this is not your use case and you just want a blocking request and want the bytes as a result? Could you clarify more context around your use case?

Perhaps an easier way for you to work with this is call curl manually using babashka.process/process:

 (def stream (:out (babashka.process/process "curl https://avatars.githubusercontent.com/u/64927540?s=400&u=644e8997a671220cb729260cfbf678d9cfddaa1b&v=4 --retry 3" {:err :inherit})))
(io/copy stream (io/file "icon.png"))

But let me know if you want a blocking bytes call because this is pretty easy to support with :as :bytes.

bhb commented 3 years ago

But perhaps this is not your use case and you just want a blocking request and want the bytes as a result? Could you clarify more context around your use case?

Great question - I probably should have started with this 😄 .

I am using babashka.curl to download a file that has been stored in the cloud. If possible, it would be ideal if I could just read this file directly into memory and process it without needing to write to a separate file and then re-read it.

Unfortunately sometimes I get transient errors like 500s or 503s when I try to download the file.

The two possible outcomes for my script are:

  1. The file is downloaded and I can proceed with processing. I don't care about the errors.
  2. After N retries, the file still has not been downloaded and I have to abort the entire script. In this case, I care about the error details (getting the last error is sufficient).

So yes, ideally there would be a blocking call that gets the entire file but will also allow me to pass --retry to curl so that the call doesn't fail if we get back e.g. a single 503.

I've worked around this by writing my own retry logic on top of babashka.curl, but it seems like it would be cleaner to just let curl take care of this.

Perhaps an easier way for you to work with this is call curl manually

That's a good idea, I can definitely do that!

bhb commented 3 years ago

But let me know if you want a blocking bytes call because this is pretty easy to support with :as :bytes.

Hm, maybe that's exactly what I need. The file I'm downloading is gzipped json, so I can't do anything meaningful with it until it's completely downloaded. I must have missed this option previously. My mistake!

borkdude commented 3 years ago

@bhb That option isn't yet there, it was a proposal. Will respond on the rest soon.

borkdude commented 3 years ago

@bhb OK, after reading your feedback, I think we can just support :bytes which is a blocking call to get the bytes from your downloaded file with support for retries.

bhb commented 3 years ago

@borkdude That sounds great. Thanks so much!

borkdude commented 2 years ago

I implemented the :as :bytes option now.