crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.34k stars 1.61k forks source link

HTTP::Client streaming body is all or nothing #9445

Open Daniel-Worrall opened 4 years ago

Daniel-Worrall commented 4 years ago

Client streaming by providing a block is difficult to work with since regular body consuming calls gets_to_end. I wished to get a percentage received during a request for progress, and this is what I ended up with.

There's also the issue that the body cannot be set if you have a Response instance without making a new one

HTTP::Client.get(url) do |response|
  length = response.headers.get("content-length").first.to_i
  body = String.build(length) do |str|
    while gets = response.body_io.gets(chomp: false)
      str << gets
      remaining = response.body_io.as(HTTP::FixedLengthContent).read_remaining
      percentage = ((length - remaining).fdiv length) * 100
    end
  end
  HTTP::Client::Response.new(response.status, body, response.headers, nil, response.version, nil)
end

HTTP::Client::Response has #consume_body_io, so I thought it may be possible to provide a similar method that takes a block. This below works in my use-case, but I'm unfamiliar with if this would be appropriate.

class HTTP::Client::Response
  def consume_body_io(&block)
    if io = @body_io
      @body = String.build do |str|
        while get = io.gets(chomp: false)
          str << get
          yield
        end
      end
      @body_io = nil
    end
  end
end
HTTP::Client.get(url) do |response|
  length = response.headers.get("content-length").first.to_i
  response.consume_body_io do
    remaining = response.body_io.as(HTTP::FixedLengthContent).read_remaining
    percentage = ((length - remaining).fdiv length) * 100
  end
  response
end
jhass commented 4 years ago

What's your target for the body? Where do you write or process it? Maybe there's a way to better calculate percentage in there.

Daniel-Worrall commented 4 years ago

For the full body, writing to a file. For the partial body, a progress bar on STDOUT

jhass commented 4 years ago

I'm sorry, I don't quite follow. What's a full and a partial body? What happens to the data of a "partial body", why would you only show some progress for it?

Daniel-Worrall commented 4 years ago

Sorry. By "full", I mean fully downloaded. When the pipe has streamed all the file into memory. Then I write it to a file.

By partial, I mean mid-stream. As it is collecting parts of the body.

My case is I am pulling very large files, 100s of MB, maybe GBs, so there is a bottleneck on pulling those files and I'd like to give feedback on download status.

jhass commented 4 years ago

Mmh, I think this should be solved with something like a yielding overload of IO.copy, something like File.open(target) {|file| IO.copy(response.body_io, file) {|pos| report_progress(pos, response.content_length) } }. Could be a shard as well.

Didn't this idea surface somewhere else recently? Does somebody remember?

HankAnderson commented 4 years ago

If you want to track downloaded progress you can do something this:

require "http/client"

url = "http://example.com"

HTTP::Client.get(url) do |response|
  length = response.headers.get("content-length").first.to_i64
  downloaded = 0_i64
  body = String.build(length) do |str|
    while gets = response.body_io.gets(chomp: false)
      str << gets
      downloaded += gets.bytesize
      remaining = length - downloaded
      percentage = ((length - remaining).fdiv length) * 100
    end
  end
  HTTP::Client::Response.new(response.status, body, response.headers, nil, response.version, nil)
end

Instead of gets you can also use read(Slice) to read a fixed amount from a slice so you get consistent progress, but that's not needed.

Is the summary of this issue "How do I track downloaded progress?". Then the above is the answer.

Also, there's really no need to open two issues for this. And the forums are probably a better place for these questions... there's nothing buggy or wrong with the standard library.

Daniel-Worrall commented 4 years ago

I'm already tracking download progress in a roundabout way. The issue is more geared towards that I feel there should be a feature in Client to handle yielding to a block as the body is received, and my use case is simply how I came across it. Streaming is supplied and supported, but it doesn't feel like a complete solution.

E: The two issues are distinct. They are feature requests, and this is the platform I thought would be appropriate for them. Let me know if I'm mistaken.

HankAnderson commented 4 years ago

I disagree. The solution above, or something similar to IO.copy, is just 3~5 lines. You may want to use different buffer sizes. You may want different things. This is not something for the standard library. You can already do what you want by composing existing APIs.

naqvis commented 4 years ago

Didn't this idea surface somewhere else recently? Does somebody remember?

yeah jhass, this was discussed on gitter iirc.

HankAnderson commented 4 years ago

Also, inspired by this answer for Go, you can do it for any IO like this (note that this is for reading files, but you can create Progress.new(response.body_io) in your use case):

class Progress < IO
  def initialize(@io : IO, @total : Int64, &@block : Int64, Int64 ->)
    @current = 0_i64
  end

  def read(slice : Bytes)
    read_bytesize = @io.read(slice)
    @current += read_bytesize
    @block.call(@current, @total)
    read_bytesize
  end

  def write(slice : Bytes) : Nil
    raise "Can't write"
  end
end

filename = "some filename"
total = File.size(filename).to_i64
File.open(filename) do |file|
  progress = Progress.new(file, total) do |current, total|
    puts "Downloaded #{current} of #{total}"
  end
  progress.gets_to_end
end

Maybe IO::Progress can be part of the standard library, I don't know. But as you can see, it's really easy to compose existing APIs to achieve what you want.