JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.13k stars 5.43k forks source link

Println performance bug in interactive terminal (I.e. much slower than c, perl, ruby, python) #43176

Open gmounie opened 2 years ago

gmounie commented 2 years ago

In brief: Julia println output speed seems oddly slow, or costly, in an interactive terminal compare to several other languages. The slow behavior is only occurring in an interactive terminal (tried several such as gnome-terminal, xterm, qterminal, sakura, emacs-vterm). No slowness are visible when the standard output is redirected.

1) In details (Linux Debian sid, julia 1.5.3, similar behavior on several x86_64 (Intel, AMD)) I wrote recursive Fibonacci computation with printf inside (Julia code below), in several languages for, e.g., demonstrating the synchronization in a terminal for my OS lecture. Julia computation performance is on par with C, C++, Rust, Fortran, Go, Crystal, without the println, but, with the println, Julia is much much slower (2.5 - 6.5 times slower) than even python3, perl5, ruby or node-js. Only Raku is slower but because the Raku 2021.09 computation are still very slow.

If the output is redirected to /dev/null, Julia is again on par with the fastest languages. The most comparable language I tried, "crystal" (ruby compilation with LLVM), does not have the same behavior.

2) Remarks: According to /usr/bin/time ('time" package in Debian) Julia seems to spend much more time in the kernel: twice the time compare to perl or python.

perf trace julia... records roughly 4M syscalls for fibo(30) compare to roughly 1M in C, perl or python, and 2M for crystal

perf record/report indicates VDSO gettimeofday at the top (3%)

3) My Fibonacci code:

import Printf: @printf

function fibo_println(n)
    if n < 2
        return n
    end
    val = fibo_println(n-1) + fibo_println(n-2)
    println(val)
    return val
end

n = 20
if length(ARGS) > 0
    n = parse(Int64, ARGS[1])
end

v, t = @timed fibo_println(n)
@printf(stderr, "julia fibo(%d)= %lld en %lld s\n", n, v, t)
mgkuhn commented 2 years ago

High I/O performance with lots of very short writes by the user requires user-space buffering, which most other languages do for that reason. In C, that's the difference between the write() function (a system call into the kernel) and the fwrite() function (a libc function that performs a copy into a user-space buffer, calling write() only whenever that buffer gets full, or in case of stdout whenever a line ends).

Julia still doesn't buffer its stdout by default. Each call to println(1) actually results in a whopping six (6) Linux/x86_64 system calls, each doing a context switch, saving all the registers, flushing TLBs, doing lots of security checks, etc. etc.:

$ strace julia -e 'for i=1:100; println(1); end'
[...]
write(13, "1", 1)                       = 1
epoll_wait(3, [], 1024, 0)              = 0
epoll_wait(3, [], 1024, 0)              = 0
write(13, "\n", 1)                      = 1
epoll_wait(3, [], 1024, 0)              = 0
epoll_wait(3, [], 1024, 0)              = 0
[...]

Yes, even the newline causes an (always very expensive) kernel context switch!

In comparison, the same operation in C (printf("%d\n", 1);) causes only one system call when writing to stdout:

$ echo 'main(){for(int i; i<100; i++){printf("%d\n",1);}}' | gcc -x c - && strace ./a.out
[...]
write(1, "1\n", 2)                      = 2
[...]

And just 0.01 system calls when writing into a file:

$ echo 'main(){for(int i; i<100; i++){printf("%d\n",1);}}' | gcc -x c - && strace ./a.out >/dev/null
[...]
write(1, "1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n"..., 200) = 200
[...]

See also #30044 (if you are not afraid of references to some strong language by Linus T. on the subject of single-byte I/O calls into the kernel; I guess that's how he keeps the kernel so fast ...).

This is obviously not very good. So what is the plan to get buffered I/O by default into Julia?

mgkuhn commented 2 years ago

A workaround is to first print into an IOBuffer:

$ strace julia -e 'out=IOBuffer(); for i=1:100; println(out, 1); end; write(stdout, take!(out))'
[...]
write(12, "1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n"..., 200) = 200
[...]
ikirill commented 2 years ago

Another workaround is to use Base.buffer_writes(stdout) together with flush(stdout), I'm not sure why that function isn't exported.

Btw, if you enable buffer_writes(stdout), Julia doesn't flush stdout when it exits. I'm not sure that's intentional.

paulmelis commented 2 years ago

Related: https://github.com/JuliaLang/julia/issues/36639

stevengj commented 1 year ago

A workaround is to first print into an IOBuffer:

A better workaround is to use BufferedStreams.jl and write to bio = BufferedOutputStream(stdout), since that uses a bounded-size (128KiB) buffer.