JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.36k stars 5.46k forks source link

Clarify Buffering rules for streams, especially PipeEndpoint #29960

Open chethega opened 5 years ago

chethega commented 5 years ago

Sorry if this is duplicate or RTFM fail. Currently, it appears like writes to PipeEndpoint are completely unbuffered, with apocalyptic performance implications:

$ strace julia -e 'p=open(`head`, "r+"); for i=1:10_000 write(p, "A") end; close(p)' 2>&1 >/dev/null | wc -l
30928

I think that a precise description of write buffering behavior belongs into the docs for all streams (if someone explains this to me I can make the PR, but I don't understand libuv).

This especially affects stdout and hence all unqualified prints, when piped into another shell command instead of a file. Some more discussion on discourse.

With this behavior it is near impossible to write generic code writing to IO that performs well with both of IOBuffer and PipeEndpoint.

I am not sure whether this is a bug, known limitation, or a conscious design decision. Pinging @vtjnash because of streams.jl commit history.

Going forward, I think that the buffering strategy (use system default max_size, or until newline, or unbuffered) should be settable and readable for all streams, either during stream creation or after the fact, for all subtypes of IO. The unbuffered default sucks for everything that is not TTY, and even there one might argue for line-buffering outside of debug builds.

StefanKarpinski commented 5 years ago

@Keno might also have useful feedback on this.