For a very large response, sometimes we want to process the first few lines and then decide to close the stream gracefully.
using HTTP
using CodecZlib
# a 1 GB gzip log file provided by TUNA
# https://mirrors.tuna.tsinghua.edu.cn/news/release-logs/
url = "https://mirrors.tuna.tsinghua.edu.cn/logs/nanomirrors/mirrors.log-20231226.gz"
# purpose: read first line and stop the stream
HTTP.open(:GET, url) do io
unzipped = GzipDecompressorStream(io)
line = readline(unzipped)
@info "first line: $line"
closeread(io)
end
This currently doesn't work as expected because closeread(io) will actually read all the remaining bytes, which is 1 GB for this case.
Maybe we should add closeread(io, force=true) to drop the remaining bytes?
My current "workaround" is to directly stop the internal io.stream...
julia> try
HTTP.open(:GET, url) do io
unzipped = GzipDecompressorStream(io)
line = readline(unzipped)
@info "first line: $line"
ntr = io.ntoread
@info "bytes left to read: $ntr"
close(io.stream)
return line
end
catch e
if e isa HTTP.RequestError
@warn "HTTP request error" e
end
end
[ Info: first line: 120.232.215.238 - - [21/Apr/2024:00:20:35 +0800] "GET /debian-security/dists/buster/updates/InRelease HTTP/1.1" 304 0 "-" "-" "Debian APT-HTTP/1.3 (1.8.2.3)" - http
[ Info: bytes left to read: 729590719
┌ Warning: HTTP request error
│ e =
│ HTTP.RequestError:
│ HTTP.Request:
│ HTTP.Messages.Request:
│ """
│ GET /logs/nanomirrors/mirrors.log-20240422.gz HTTP/1.1
│ Host: mirrors.tuna.tsinghua.edu.cn
│ Accept: */*
│ User-Agent: HTTP.jl/1.9.3
│ Content-Length: 0
│ Accept-Encoding: gzip
│
│ [Message Body was streamed]"""Underlying error:
│ EOFError: read end of file
└ @ Main REPL[126]:16
For a very large response, sometimes we want to process the first few lines and then decide to close the stream gracefully.
This currently doesn't work as expected because
closeread(io)
will actually read all the remaining bytes, which is 1 GB for this case.Maybe we should add
closeread(io, force=true)
to drop the remaining bytes?My current "workaround" is to directly stop the internal
io.stream
...