JuliaLang / MbedTLS.jl

Wrapper around mbedtls
41 stars 50 forks source link

possible performance issue: mbedtls_gcm_update CPU utilization #254

Open vustef opened 2 years ago

vustef commented 2 years ago

I've written a simple example to test the performance of MbedTLS. It's unoptimized and probably incorrect in some aspects, but I hope it shows the issue that I'm facing when using MbedTLS through HTTP.jl.

using MbedTLS
using Sockets
using Base.Threads

function tls_test(num_iters, concurrency)
    entropy = MbedTLS.Entropy()
    rng = MbedTLS.CtrDrbg()
    MbedTLS.seed!(rng, entropy)

    size = 1024*1024
    buffer = Array{UInt8}(undef, size)
    p = Ptr{UInt8}(ccall(:jl_value_ptr, Ptr{UInt8}, (Any,), buffer))

    sem = Base.Semaphore(concurrency)
    @sync begin
        for i in 1:num_iters
            @spawn begin
                sock = connect("httpbin.org", 443)

                ctx = MbedTLS.SSLContext()
                conf = MbedTLS.SSLConfig()

                MbedTLS.authmode!(conf, MbedTLS.MBEDTLS_SSL_VERIFY_REQUIRED)
                MbedTLS.rng!(conf, rng)

                function show_debug(level, filename, number, msg)
                    @show level, filename, number, msg

                MbedTLS.dbg!(conf, show_debug)


                MbedTLS.setup!(ctx, conf)
                MbedTLS.set_bio!(ctx, sock)


                Base.unsafe_write(ctx, p, size)

tls_test(4096, 512)

On machine with 8 cores and 1.5GB/s NIC throughput, this achieves a bit less than 200 MB/s. CPU is 100%, and it takes ~22s. mbedtls_gcm_update takes 40%, which means that CPU time spent in that function is ~70s (accounting for 8 cores). My assumption is that this function doesn't do network communication nor invokes it, but does pure processing.

So throughput of mbedtls_gcm_update is effectively ~58 MB/s per core on this machine. This means that while machine has 1.5GB/s throughput, mbedtls_gcm_update is taking time, allowing only for around ~464MB/s for 8 cores in ideal conditions (no other CPU usage in the callstack), and would require more than 24 cores to utilise full NIC.

For comparison, similar (with a bit higher level of abstraction) test with HTTP put requests in Go, on the same machine, can achieve ~1.5GB/s, hitting NIC's throughput as a bottleneck.

Are there any ideas for how mbedtls_gcm_update could be optimized? Is this something worth submitting as an issue in https://github.com/Mbed-TLS/mbedtls ? I am not sure if this is also what happens if it's used directly, without Julia wrapper though.

PProf profile file: prof_ssl1.pb.gz

Here's a screenshoot of profile file opened using PProf: image

vustef commented 2 years ago

Hey @vtjnash , I see you mention different performance issue here: #252 Do you have some performance numbers too?

vustef commented 2 years ago

Some more points from other people too, here: https://discourse.julialang.org/t/slow-http-jl-requests-when-ssl-is-enabled/36183/