JuliaWeb / HTTP.jl

HTTP for Julia
626 stars 177 forks source link

early stop for streaming response #1173

Open johnnychen94 opened 2 months ago

johnnychen94 commented 2 months ago

For a very large response, sometimes we want to process the first few lines and then decide to close the stream gracefully.

using HTTP
using CodecZlib

# a 1 GB gzip log file provided by TUNA
# https://mirrors.tuna.tsinghua.edu.cn/news/release-logs/
url = "https://mirrors.tuna.tsinghua.edu.cn/logs/nanomirrors/mirrors.log-20231226.gz"

# purpose: read first line and stop the stream
HTTP.open(:GET, url) do io
    unzipped = GzipDecompressorStream(io)
    line = readline(unzipped)
    @info "first line: $line"


This currently doesn't work as expected because closeread(io) will actually read all the remaining bytes, which is 1 GB for this case.

Maybe we should add closeread(io, force=true) to drop the remaining bytes?

My current "workaround" is to directly stop the internal io.stream...

julia> try
           HTTP.open(:GET, url) do io
               unzipped = GzipDecompressorStream(io)
               line = readline(unzipped)
               @info "first line: $line"

               ntr = io.ntoread
               @info "bytes left to read: $ntr"


               return line
       catch e
           if e isa HTTP.RequestError
               @warn "HTTP request error" e
[ Info: first line: - - [21/Apr/2024:00:20:35 +0800] "GET /debian-security/dists/buster/updates/InRelease HTTP/1.1" 304 0 "-" "-" "Debian APT-HTTP/1.3 (" - http
[ Info: bytes left to read: 729590719
┌ Warning: HTTP request error
│   e =
│    HTTP.RequestError:
│    HTTP.Request:
│    HTTP.Messages.Request:
│    """
│    GET /logs/nanomirrors/mirrors.log-20240422.gz HTTP/1.1
│    Host: mirrors.tuna.tsinghua.edu.cn
│    Accept: */*
│    User-Agent: HTTP.jl/1.9.3
│    Content-Length: 0
│    Accept-Encoding: gzip
│    [Message Body was streamed]"""Underlying error:
│    EOFError: read end of file
└ @ Main REPL[126]:16