Open asterite opened 3 years ago
I was having the same thought process for it actually, so I'm all for it. body
and body_io
always felt weird and out of place. So also 👍 to calling it body
on both.
If wanted Streaming
could maintain body_io
as deprecated alias to body
for a while. Consumed
will not need body_io
as that was never working for its usecases in the first place and body
keeps working as expected.
The biggest breakage here will be people typing stuff as Response
which in this approach will loose its body
and body_io
properties, so people will have to update their type declarations to continue using it.
The alternative to subclassing is making body
be String|IO
and have people type check that... which still is a runtime assertion and barely better than body
and body_io
.
Third option I see would be to make body
always be IO
, a prefetched IO::Memory
instance for the non-streaming case, to not leak FDs
Another idea: let both have body
and body_io
:
body
and we could expose body_io
as an IO::Memory
over body
.body
consume the body_io
into a String, then replace body_io
with IO::Memory
over body
. The problem with this is if someone consumes a bit of the IO before calling body
.But maybe the downsides are not that bad. And we could keep a single Response
type.
But I still feel having two types is better.
Yeah, that seems like it's equally brittle to use as the current interface. I would prefer forcing users to always call .body.gets_to_end
if they want a string explicitly (building on the always IO
option outlined above).
Yeah, this part of the API is really shady and I'm all in for improving that. I'm not to sure about having two separate types, though. Feels a bit overhead and IMO might not be that user-friendly.
There's another solution I've already mentioned in https://github.com/crystal-lang/crystal/issues/8461#issuecomment-558258251: Yield the IO as a second argument to the block in the streaming scenario.
I'd like to inspect some real examples to see how this is used in the field. That should help us discuss an optimal improvement for this API.
Hm, yielding it as second argument still seems to have potential runtime errors or weird behavior when mixing using that IO with calling Response#body
at some point.
According to skimming the code available on Github, by far the most common scenario is a quick and dirty call which doesn't really need a response object and just retrieves the body as a string:
HTTP::Client.get("http://example.com").body
(more elaborate versions also include a .success?
check or - rarely - some crude redirect logic)
Maybe there should be a convenience method for this which also handles some HTTP protocol context (like status code).
If we had this, the majority of use cases (the simple ones) would be dealt with. Then there would actually be little reason to have a non-streaming variant of Reponse
... All more complex use case which still only want the body as string would just have to call gets_to_end
.
I think that might really be a great solution. Essentially, it just means removing Response#body
and providing an easy-to-use alternative for the most common use case.
Why there is one that consumes the body to a string, and not the other way around? It's not obvious at a first glance, so I think having a single behavior for both will be less confusing.
We could have:
getter body_io : IO
# Consummes the body IO
def body!
@body_io.gets_to_end
end
Why there is one that consumes the body to a string, and not the other way around? It's not obvious at a first glance, so I think having a single behavior for both will be less confusing.
We could have:
getter body_io : IO # Consummes the body IO def body! @body_io.gets_to_end end
That's https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725412160 with different names, so 👎
If we want to do something for 1.0 I am inclined to:
That way:
If we want to tweak further the design I think is after 1.0.
The one-liner argument is not a good enough one, because we can do:
p HTTP::Client.get("http://example.com").body
p HTTP::Client.get "http://example.com", &.body_io.gets_to_end
# Perhaps even p HTTP::Client.get "http://example.com", &.body!
@j8r I don't understand your last comment. What point are you trying to make? Is it hat the yielding variant is only marginally longer and the "simple" one isn't necessary?
Yes, I mean that all could be streaming, and as user-friendly as non-streaming. Both kind of users - those needing more low level/performance and those just wanting something easy - will be happy then.
@bcardiff Sounds good to me. That would allow a later discussion about bigger changes, possibly in conjuction with #6011.
If we want to do something for 1.0 I am inclined to:
I'm not sure whether that really improves the situation, it just seems to shift the inconsistency away from the particular case that triggered this discussion, so I think I would then rather prefer to leave things as is. (Personally I think 1.0 is the perfect opportunity for a breaking change like this and starting with a clean API into the stable phase).
To me this is about getting rid of having both body
and body_io
, which is where all the confusion, weirdness and inconsistency stems from. I don't see benefit in changes that don't achieve that goal.
@bcardiff
make the streaming response raise on runtime when body is called
With this RFC I'm trying to avoid runtime errors and move them to compile-time errors.
@straight-shoota
If the most common case is fetching a response without a block and getting the body as a string then it shouldn't matter if we introduce two response types, because most don't have to deal with a specific type name. The body
property will continue working for them. The advantage is that when you use the streaming version you will get a compile-time error if you type body
instead of body_io
(same for the other way around).
@asterite
If the most common case is fetching a response without a block and getting the body as a string then it shouldn't matter if we introduce two response types
But what's the point in having two separate types when one of them is rarely used and can simply be emulated using the other one?
But what's the point in having two separate types when one of them is rarely used and can simply be emulated using the other one?
Type safety.
I know that's the goal for this RFC. My suggestion to remove Response#body
would be completely type safe. A second type for type safety is only required if there's a real need for a non-streaming response type. And that's what I'm questioning.
A second type for type safety is only required if there's a real need for a non-streaming response type
I think in most cases you want a non-streaming response because responses are small. However, in a few cases you'd like a streaming response, for example when downloading a large file, or when the response is meant to be streaming (like sending lines of JSON that you parse individually, and the connection is kept open).
The way we design it, for the streaming way you have to do it in a block so you can't forget to close the IO. In Go you always get an "IO" and you have to remember to close it, and we didn't like that.
So, we have two use cases and the way you use them in the same except that in one case you deal with a String, in the other you deal with an IO. We could extract that outside of the Response
object, but then it would be a bit weird:
response, body = HTTP::Client.get("...")
When you talk about an HTTP response body, the body really belongs to the response, it's not a separate entity. And it's just nice to be able to pass the response to other functions.
What I'm wondering is: if we split it into two types, in which way that produces friction?
I feel like the most simple breaking change we could do right now is removing body_io
, making body
return IO
always and let that be IO::Memory
for the non streaming case (avoiding Go's pitfall). This increases API usage verbosity for the moment, but would also provide a simple baseline for adding convenience API in a non-breaking manner, say for example HTTP.get_body(...) : String
.
@asterite
I totally agree to your assessment of the typical use cases. I already mentioned that in https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725430190
My point is, that when you want the response as a string, chances are high you don't really care about the response object. The vast majority of use cases are literally just HTTP::Client.get("http://example.com").body
.
So why not make it easier for these use cases and just return the body (as String
) and not the Response
with cached body?
For example, we could end up with two API methods like this:
HTTP::Client.get("http://example.com") # returns String
HTTP::Client.get("http://example.com") do |response|
# here we consume response body io
end
If you just want a simple response body, you use the non-yielding method and get a string result. The method would also provide basic status code handling (for example raise or return nil
unless successful). That's more included features for robustness and simpler to use than the current API.
If you have more advanced use case, like responding to specific status codes, handling response headers etc. you use the yielding method and get a response object (which only provides a streaming IO). If you need the body as string, you have to consume the IO.
The vast majority of use cases are literally just HTTP::Client.get("http://example.com").body
Then the vast majority of use cases are wrong. Consuming the body without checking that the status is okay It seems like a latent bug to me.
So why not make it easier for these use cases and just return the body (as String) and not the Response with cached body?
Because a body without an HTTP status code can't be trusted.
Totally agree and that's why I mentioned that the method should check the status code and fail if it's not a success.
Ouch, I should finish reading before replying 😊
Guess what someone just asked in gitter? https://gitter.im/crystal-lang/crystal?at=5fb01edab4283c208a656378
I can't believe there's so much resistance to make this trivial change.
And I say it's trivial because I already have this implemented in my machine, I'm just waiting for an "okay" so I can send a PR.
I don't see any resistance to improving this. We just need to agree on the solution.
IMO having to separate types is completely unnecessary. We really need only a single type with IO. I think the best improvement we can do right now is to add IO::Memory
for non-streaming instance. So that's pretty much what @bcardiff and @jhass have been proposing as well. There's only different opinions on whether the IO should be named body_io
or body
(and subsequently what to do with the string-returning method).
Just adding a fallback for body_io
is the easiest change. It doesn't break anything. It doesn't improve much either, but it would fix the issue discussed on gitter.
Renaming body_io
to body
OTOH is a breaking change. It's easy to fix though (just insert to_s
). And when it's used in a place where either IO or string do, there's no need to change any code either. So maybe it's fine. However, it's not necessarily better than renaming to body
. IMO it doesn't matter whether the property ends up being named body
or body_io
. Staying with the latter won't break code that is already using the streaming interface. And it would allow to properly deprecate body
before removing/breaking its return type.
If you take a look at newer languages like Rust, Elm... even Haskell has this... you define all possible states a type can be (algebraic data type) and then there's no way to represent an invalid state. In my proposal, when you fetch a non streaming response, you can only read it as a string. If you are streaming it, you can only treat it as an IO. Of course you can explicitly go from one to the other (build an IO memory or read everything into a string) but as it's been avoided all the time here, implicit behavior is not good.
I sincerely can't see the downside of having two types other than hypothetical situations.
A very practical downside from the API design is that IMO a non-streaming type is simply not necessary at all. So we can eventually have just a single type with streaming IO. No type with buffered string body. It means we can't have a proper fix right now. That's a benefit of the two-type solution. But the final solution to have only one type sounds much better to me.
How can you make it so that users don't forget to close the IO? If you have a solution for that, I can concede.
I'd suggest to remove the overload that returns a Response
and only keep the one that yields the instance.
That would also free up the non-yielding overload for a convenience method that directly returns String
as suggested in https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725430190. Note that it doesn't have to be that way. The convenience method needs extra discussion. But it would be an option.
I think we are going in circles here so I'll just stop. Sorry :-)
You're probably right.
I'll still add this to my last response because I might have been thinking to far ahead: We can obviously also keep the non-yielding overload and have it return a Response
instance. That one would just not provide access to the network IO, but the buffered body via IO::Memory
. So the data situation and the responsibility to close a response are exactly the same as now. What changes is just that you can't access the body as a string directly. Instead you always have to consume an IO, either memory or network depending on how the response is obtained.
Which is what I suggested in https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725410164 and https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725453347 already :P
Yes and I in https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725430190 and @bcardiff in https://github.com/crystal-lang/crystal/issues/9901#issuecomment-725435327. It essentially just differs on whether to stay with body_io
or move to body
for the streaming IO.
Oh, I really like that. @jhass I think I missed that third option, sorry.
So, to see if I understand things correctly, the idea is:
HTTP::Client::Response
type whose body
is always an IO
IO
is directly connected to the socket, and it will be closed once the block endsbody
to an IO::Memory
over the string bodyThe nice thing about this approach is that you always deal with a response in the same way: it's an IO. The not-so-nice thing about this is that for the non-streaming version you have to do response.body.gets_to_end
to get a string, which has three "problems":
body
: but I'm fine withIO::Memory#to_s
and IO::Memory#gets_to_end
will create a duplicate string: however, we can make IO::Memory
know that it was initialized from a string, and in that case return the original string. There's still one more memory allocation for the IO::Memory
, but it's fine (it's just 80 extra bytes).body
is always an IO
one could pass it down to JSON.parse
or X.from_json
, or X.from_yaml
, and parsing an IO is slower than parsing a String: but you can always read the IO into a string if you want to avoid that performance issue.So I think I'm 👍 for this approach. It's a breaking change, but it's probably better to do it before 1.0 is released.
Yes, that's the idea.
IO::Memory
as an IO
ish view over a string, even a static one (think file embedding).JSON
and YAML
detect an IO::Memory
in "read only" mode and access the underlying string instead.Great :+1: Sounds like we're pretty much agreed then. It seems body
vs. body_io
is still to be settled.
body
is the better alternative because the _io
suffix is simply unnecessary. But the change to a bit more concise name has to match the cost of a breaking change.
Staying with body_io
has the benefit of not breaking any existing code using the streaming response. body
returning string (for non-streaming response) could be properly deprecated and continue to work for now. So effectively, we wouldn't need to break anything immediately.
IMO it's better to go into 1.0 with a deprecated method than a broken one.
I'm in favor of body_io
but going for body
is fine with me either.
I would strongly prefer body
, making body_io
a deprecated alias to it is fine for me, but not necessary. Turning Response#body
from returning a String
to returning an IO
is a breaking change already anyhow.
Context
I think it's a shame that having a compiled and type-safe language we are using the same type for non-streaming (consumed) and streaming responses:
The "okay" and "not okay" here mean that when it's "not okay" we get a runtime error. This is just a minor thing because you'll probably notice it right away and change it to use the other. However, it can be quite confusing to understand why you get that error. And also, we could catch this at compile-time.
Proposal
I propose we have this hierarchy:
String
body.IO
bodySome decisions to make
body
property in both cases? That means you can always usebody
(nice) but the type changes depending on whether it's in a block or not. The downsides are that it can be a bit confusing, and it's also a breaking change. Alternatively we can keep it asbody
andbody_io
so it's not a breaking change.