crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.39k stars 1.62k forks source link

Ergonomics of changing `IO::Buffered` buffer_size #11270

Closed jgaskins closed 1 month ago

jgaskins commented 3 years ago

Feature Request

For my armature shard, you render into the HTTP::Server::Response buffer while processing the response. That is, there is no single render — you can render several times, so you can render separate templates for site headers, flash content, errors, main body content, and site footer. Basically, every template is a partial.

Most of the time you can render the site header and flash into the buffer and change the status or headers while processing main body content. The response buffer only needs to be large enough to accommodate the what comes before that point, such as assets, navigation, etc.

Example ```crystal route context do |r, response, session| # Authenticating the request, buffer is empty if current_user_id = session["user_id"].try(&.as_s?) current_user = UserQuery.new.find_by_id(current_user_id) end # Render the site header and top nav render "app_header", to: response r.on "posts" do r.root do # ... r.post do params = r.form_params if valid_authenticity_token? params, session new_post = PostQuery.new.create( body: params["body"], # ... ) # Sets status = :see_other and `location` header to the given path response.redirect "/posts/#{new_post.id}" else response.status = :bad_request response << "

Invalid authenticity token

" end end end end # The buffer does not need to be able to contain the footer, so it can have been flushed by now render "app_footer", to: response end ```

The problem comes when you need to tune that buffer size. For example, if you're using Tailwind CSS, the site header/nav (from render "app_header" in the above example) is going to contain a lot of markup, easily surpassing the default 8KB buffer, causing it to flush, and then setting the status and headers has no effect.

One solution I often use is to set response.output.buffer_size, but the ergonomics there aren't amazing:

begin
  response.output.as(IO::Buffered).buffer_size = 32 * 1024 # 32 KB
rescue ex : ArgumentError
  # Buffer size is already set
end

The typecast is needed because response.output is implemented as an IO::Buffered but its interface is an IO (presumably because GzipWriter is not an IO::Buffered), but the bigger issue is that you can't check whether it's safe to set the buffer size. If the client specifies the Connection: keep-alive header, the HTTP::Server::Response is reused, so the previously allocated buffer is already there, so response.output.buffer_size= raises an exception.

I don't know if there's a 100% solution to this. It's important to accommodate non-Buffered IO instances so I think the type-casting will have to remain, but being able to ask whether it's safe to set this value and/or making setting it to the current value a no-op (i.e. if I'm always setting it to 32KB, it won't raise an error because it's already 32KB) might go a long way to improving the ergonomics here.

I would love to be able to say something like response.buffer_size = 32 * 1024 and things just work, but I don't know if that's ideal since HTTP::Server::Response has to support non-Buffered IOs, but maybe in those scenarios it could give a nice error message.

From what I can tell, avoiding the exception when setting the buffer size to the same value as the current buffer size would not realistically break any backward compatibility. The exception is raised because the buffer is already allocated and there are no further checks to reallocate, so in those cases it's protecting you from doing something unsafe that (AFAIK) can't be guaranteed at compile time. But if we're setting it to the same value, is the exception protecting you from anything?

straight-shoota commented 3 years ago

I'm not sure your use case can reasonably work with a generic buffered IO liked the one used by HTTP::Server::Response. To me it doesn't seem like a problem with the API to set the buffer size, it's more about which tools to use.

How can you be certain that 32kB or whatever you set as buffer size would be large enough? What happens if you happen to encounter a very large content, bigger than the buffer size? I understand you're working on a framework, so you can't possibly know any upper limit of body sizes in advance. As I understand it, for this approach to work, you need some way to avoid the buffer running full and being sent prematurely.

I think this requires custom behaviour which must be provided by a custom buffer.

jgaskins commented 3 years ago

@straight-shoota The block of code in the original post that sets the buffer size has been running in a production app since I first asked about this on the forum). It does work reasonably, I just want to avoid having to handle the exception.

In this method the use of ArgumentError implies the argument is the problem but the conditional never checks the argument. And the error message even explicitly specifies that changing the buffer size can't be done, but I'm not actually changing the value, I'm setting it to the same value as I did on the previous request. Returning early if value == @buffer_size will reflect advertised behavior.

The only other suggestion I had was to have a method that returns this expression to know whether it's safe to set the buffer size.

As I understand it, for this approach to work, you need some way to avoid the buffer running full and being sent prematurely.

Not quite. I don't want an ever-expanding buffer. I absolutely want it to flush after it fills up to keep the memory footprint small (if I'm emitting 2MB of JSON, I don't want to hold all of it in memory), I just want to be able to tune when it does that based on observed behavior. As it stands today, that isn't possible without rescuing an exception.

straight-shoota commented 3 years ago

I absolutely want it to flush after it fills up to keep the memory footprint small

Then I apprently misunderstood your approach. But what would be the reasoning to increase the buffer size then?

This part of the original post confuses me:

Most of the time you can render the site header and flash into the buffer and change the status or headers while processing main body content. The response buffer only needs to be large enough to accommodate the what comes before that point, such as assets, navigation, etc.

That sounds like you absolutely don't want the buffer to flush as long as the handler might want to change status and headers.

And you can't really know what is "large enough" for the buffer, i.e. how big "assets, navigation etc." are. Even if it works well in your production app, it could easily break if a handler renders a huge navigation, for example, and then decides to change a header. As a user, I would certainly not want to use such a system that can explode under my feet if I send more data than the fits in the buffer.

asterite commented 3 years ago

I agree with @straight-shoota. The solution seems to be to buffer the entire output somewhere else, then move it to the response once you know nothing else will be written to it.

docelic commented 3 years ago

@jgaskins if it is imperative to flush in small chunks like you originally state, and still preserve the option to modify headers, maybe you should just separate the code path which renders the partials from the code path that determines whether the headers/status will be modified (and if headers need to be modified, to modify them before the rendering of partials starts).

Blacksmoke16 commented 3 years ago

If the response to the request is mutable up until the end of your framework's logic, it is kinda not ideal to be mutating the original HTTP::Server::Response instance during that process. Otherwise you may run into issues like https://github.com/crystal-lang/crystal/issues/8712.

Athena handles this by having a dedicated Response type that can be mutated all you want, and is then applied to the actual response object. I think this type of approach would be better than allowing HTTP::Server::Response's buffer to be changed.

jgaskins commented 3 years ago

The replies here seem to be more concerned with why I chose to do something than the fact that existing functionality in the standard library doesn’t match what it says it does. If you think my framework’s approach to rendering is fundamentally flawed, that’s fine, but let’s discuss that on the forum. I only explained what I’m working on in order to give context.

I’m trying to use existing stdlib functionality but avoid the exception when it doesn’t need to be raised. IO::Buffered#buffer_size=(value) exists and I would like to use it, but it cannot be done safely and I would like to change that.

What are the downsides of my suggestions? How would it be bad to adjust the behavior to match the method documentation and exception interface in a way that’s backwards-compatible?

Blacksmoke16 commented 3 years ago

the fact that existing functionality in the standard library doesn’t match what it says it does.

Where do the docs say you can resize the buffer of HTTP::Server::Response? Something worth pointing out is the fact that HTTP::Server::Response's output IO being a buffered IO is not publicly exposed, but more of an implementation detail. It's probably more so this reason why you're having so much trouble. There is a public method that allows you to override the output IO, which would allow you to set a buffered IO with your desired size.

What are the downsides of my suggestions?

I think it would require a public API to be defined and for the server response to be tightly coupled to a buffered IO, or at least be implemented to only allow doing this when the internal output is buffered. Mainly since there is no guarantee that when you do context.response.output you get a IO::Buffered instance.

EDIT: I'm not 100% against being able to do it. But I think it's not as simple as it sounds and could easily be worked around by a diff design/approach.

jgaskins commented 3 years ago

Where do the docs say you can resize the buffer of HTTP::Server::Response? Something worth pointing out is the fact that HTTP::Server::Response's output IO being a buffered IO is not publicly exposed, but more of an implementation detail.

I mentioned HTTP::Server::Response as context for how the issue manifests in my application, but the issue is about IO::Buffered#buffer_size=(value) irrespective of the entire HTTP module.

If you have an object that is known to be an IO::Buffered instance (which you can check safely in multiple ways), the only public method that sets its buffer size is documented as an exception being raised when changing it after the buffers are allocated, implying that calling io.buffer_size = io.buffer_size won’t raise an exception (it’s a safe operation even if the buffers have been allocated). There is also no public method that returns this expression to know if it’s safe to actually change it.

I'm not 100% against being able to do it. But I think it's not as simple as it sounds and could easily be worked around by a diff design/approach.

I assure you that GitHub issues were not my first stop after I noticed this. I’ve been thinking about and working around this for nearly a year now.

When you first started talking about this Athena framework you were creating, you were doing things so differently from the way everyone was building things in Crystal but the language didn’t have affordances for your approach yet. When you put in all that effort to make annotations and macros better, I didn’t try to convince you that “if you just do it this way you won’t have this problem” because that’s the vision you had for your framework, even though I didn’t understand it. And it turned out to be a pretty neat approach, which I acknowledged here. Please show the same respect now that I showed you then. If you want to discuss Armature’s design, I’m happy to talk about it at length, but let’s do it somewhere else.

Blacksmoke16 commented 3 years ago

I think there has been a partial miscommunication in what this issue is asking for as we've kinda all been focusing on how you're wanting to be able to change the buffer size of a server response when in fact you just want to know if it's safe to call that method. If all you want to be able to do is expose whether an IO::Buffered instance has been allocated and no-op if it's the same size the I think that's a lot more reasonable of an ask. Should also maybe update the PR title to reflect that.

If you want to discuss Armature’s design, I’m happy to talk about it at length, but let’s do it somewhere else.

For sure! If you ever want to chat thru something feel free to message me on Gitter/Discord or something!

jgaskins commented 3 years ago

Should also maybe update the PR title to reflect that.

Oh, that's really embarrassing, I didn't realize that's what I'd named it. I couldn't figure out why everyone kept focusing on the HTTP part, but that certainly explains that. 🙃

I'm gonna blame it on breakthrough COVID. It's made life really weird the past couple weeks.

asterite commented 3 years ago

Only because the discussion happened in the context of an http response:

https://github.com/golang/go/issues/7467

asterite commented 3 years ago

Would it help to also be able to configure this on the http server? That way the server can be sure to set that size when it's safe to do.

asterite commented 3 years ago

I also think we copied IO from Ruby. In Ruby IO is buffered and the buffer size isn't configurable. Some time ago it was suggested to make this configurable. I was against it but the change went through. So now we have more features to support compared to Ruby, and I guess it makes sense to be able to ask if the buffer size can be changed or not.

That said... If your app behavior depends on being able to ser this value, but you can't change it, what will your app do?

jgaskins commented 3 years ago

Would it help to also be able to configure this on the http server? That way the server can be sure to set that size when it's safe to do.

That'd be pretty useful, I think. Seems similar to the convention of certain TCP clients setting timeouts on the socket when it's created.

If your app behavior depends on being able to ser this value, but you can't change it, what will your app do?

It depends on the semantics. If the semantics are changed such that io.buffer_size = io.buffer_size never raises (AFAICT there does not exist a scenario where this is a dangerous assignment), all I would need to do is remove the extra handling currently in place that ignores the ArgumentError because I set it to the same value on every request.

If you mean the ability to change it from the default is removed entirely, it breaks how Armature works for all but the smallest HTML site headers, so my only choices would then be to take @straight-shoota's suggestion of implementing my own buffered IO or monkeypatch the method back in.

asterite commented 3 years ago

Changing the buffer size to a smaller value when you already have more data in the buffer than what you are shrinking the buffer size to is the only problematic case, I think. But setting a bigger value should never be a problem. Or, well, we could try to support it.

straight-shoota commented 3 years ago

Setting a smaller value should technically not be that much of a problem either. You would just have to flush the current buffer and allocate a new buffer with the reduced size.

I don't see a use case for this, though.

As @jgaskins has made clear, his intention is only to set the buffer size once in a buffer's lifetime (before it is allocated) and keep that size when the buffer is reused for subsequent responses (with a keep-alive connection).

straight-shoota commented 3 years ago

@jgaskins After reading your OP again with the further explanations in mind, I'm a bit confused about the workaround you presented. Why don't you just check the buffer size and only set it if it's different from what you want it to be? I image that should be sufficient for your workflow when all Response instances and buffers are handled by the framework and thus all get the desired size and they only have a different size if they're fresh and not yet allocated.

jgaskins commented 3 years ago

@straight-shoota I want to say I tried that at the time and it still raised, so I assumed something about the reuse of the Response was resetting the buffer size to the default value. But I tried it again just now and it does work.

Not sure what I did at the time, probably something silly like = instead of == in the conditional.