JuliaServices / CloudStore.jl

A simple, consistent, and performant API for interacting with common cloud storage abstractions
Other
16 stars 8 forks source link

Don't assume the only error in getObject is an ArgumentError #47

Closed Drvi closed 11 months ago

Drvi commented 11 months ago

There are two main codepaths for get -- we either do a multipart download for larger files or a single GET call for smaller ones. When we give the method a preallocated buffer, we also constrain the maximum size of the object we want to download (anything bigger than the buffer should fail). When the object is larger than the buffer, the error handling is different for the two codepaths:

  1. multipart download starts with a single HEAD call to learn the size of the object, and once we know it, we can compare it with the provided buffer, and throw an ArgumentError early.

  2. For small files, we don't do the extra HEAD request, but we let HTTP.jl handle the case internally in the single GET call and if the buffer turned out to be too small, we'd get a RequestError wrapping and ArgumentError back from HTTP.jl.

This means that depending on the size of the input buffer, we'd sometimes get an ArgumentError and sometimes a RequestError both communicating the same underlying problem -- the input buffer is too small.

My current understanding is that to make these exception types more predictable, we always threw an additional ArgumentError in the small file case, assuming that buffer size mismatch was the only possible source of error at the call site, but we've seen that other kinds of errors are possible and that the ArgumentError was hiding the actual problem in those cases.

I also think that we used to do a HEAD request even for small files in the past and when we stopped doing that for performance reasons, we forgot that the HEAD was shielding the GET from various errors like 403, 404 and so on.

In this PR, we no longer throw an additional ArgumentError in the small file case, instead we inspect the RequestError and if it contains a single ArgumentError we rethrow that. This is done because it's backward compatible -- we wanted to throw an ArgumentError on buffer size mismatch before so we'll continue to do so. I'm not sure if this is the right call though, and would appreciate some feedback/discussion on that.

codecov[bot] commented 11 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (0229c4c) 84.21% compared to head (6b52c0b) 84.32%.

Files Patch % Lines
src/get.jl 92.30% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #47 +/- ## ========================================== + Coverage 84.21% 84.32% +0.11% ========================================== Files 7 7 Lines 646 657 +11 ========================================== + Hits 544 554 +10 - Misses 102 103 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Drvi commented 11 months ago

Thanks @quinnj! Would you prefer for the two error paths to report a) What HTTP returns "Unable to grow response stream IOBuffer $(sizeof(out)) large enough for response body size: $(sizeof(in))" or b) What we currently do for multipart downloads "out ($(sizeof(out))) must at least be of length $(sizeof(in))" ? Changing from a) to b) here would require some pattern matching on the message in the ArgumentError we intercepted from HTTP.jl, which I'm not the biggest fan of.

quinnj commented 11 months ago

Yeah, I think going w/ a is fine and easier, right?