golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.38k stars 17.71k forks source link

crypto/tls: customisable max TLS record size #20420

Open 925dk opened 7 years ago

925dk commented 7 years ago

Hi,

The default TLS record/chunk size is 16kB - and this is what golang uses (maxPlaintext in crypto/tls/common.go I think).

It would be useful if the max TLS record size was customisable via Config.

Use-case is we have memory restricted embedded devices (running mbed TLS) talking TLS to a golang server. The client TLS (receive) buffer needs - in our case - to be less than 16kB to fit in memory - and the server and the client has to agree on this max size.

For mbed TLS this is customisable using MBEDTLS_SSL_MAX_CONTENT_LEN.

ALTree commented 7 years ago

It would be useful if the max TLS record size was customisable via Config.

There was some general discussion about adding knobs in #14376 (although #14376 is about a different problem). I'll turn this into a proposal and we'll see what people think about this.

ALTree commented 7 years ago

cc @tombergan @bradfitz

iduhetonas commented 7 years ago

Specifically, the TLS Extension Definition is RFC 6066 "Maximum Fragment Length Negotiation", seen here: https://tools.ietf.org/html/rfc6066#page-8

rsc commented 7 years ago

ping @tombergan does this relate to any of your recent experiments with packet sizes?

tombergan commented 7 years ago

It relates but looks different. The feature I added was to automatically use a smaller packet size early in the connection. It looks like @925dk is requesting control over a protocol-level config (MBEDTLS_SSL_MAX_CONTENT_LEN). I think the question of whether or not to support that config is better directed to @agl.

iduhetonas commented 7 years ago

For clarification, this feature just negotiates the maximum size of a fragment (of a packet) between a client and a server. The default fragment size is 2^12, which means that memory-constrained devices must hold a 2^12-byte buffer.

For this to work with golang, the server need only to negotiate the max fragment size (2^9, 2^10, 2^11, 2^12), and then promise to never send a fragment larger than that size.

There's probably more details involved, but that's my basic understanding.

925dk commented 7 years ago

Negotiating the max size would be elegant, but a simple knob for setting it would also do. Either way works for me.

rsc commented 7 years ago

/cc @agl @FiloSottile

FiloSottile commented 7 years ago

Since there's a RFC 6066 extension for this use case already, and it doesn't seem harder to implement that than a Config knob, I would be for implementing the extension and not exposing any new knob.

Wondering if we should worry about DoS potentials of the lowest values, 2^9 and 2^10. That would be a lot of fragmentation.

@925dk What ballpark sizes does your application need?

925dk commented 7 years ago

@FiloSottile The lower ones 2^9 or 2^10.

tombergan commented 7 years ago

@FiloSottile FWIW, see this conversation. Google frontends have been using small record sizes (about 2^10) for the first 1MB of data sent on every connection for years with no known issues. The connection reverts to small records after 1 second of idle time. Given this experience, I wouldn't be concerned about DoS potential for a value of 2^10, unless the server is very highly constrained. I don't know of servers that regularly use packs of size 2^9, so I can't comment on that value.

FiloSottile commented 7 years ago

Yeah, I think we do that at the beginning of a connection already. This would allow an attacker to make it happen for a whole high-bandwidth connection, but I'm not too concerned either. A 2x bandwidth multiplier is not big in terms of intentional DoS. I'll run a benchmark to check the CPU load stays in the same order of magnitude, but then I'd be for implementing the RFC 6066 extension (and might just do it myself).

rsc commented 7 years ago

I think this is waiting on @FiloSottile's test but I don't see any objections being raised, so this seems likely to be accepted.

rsc commented 7 years ago

ping @FiloSottile

FiloSottile commented 7 years ago

Sorry, fell off my radar.

The cost of 2^9 is pretty intense, a 10x drop in throughput. Even 2^10 brings a 5x slowdown.

name                             old speed      new speed       delta
Throughput/MaxPacket/1MB-4       40.7MB/s ± 2%  253.9MB/s ± 1%  +523.33%  (p=0.036 n=5+3)
Throughput/MaxPacket/2MB-4       42.1MB/s ± 6%  307.5MB/s ± 9%  +631.19%  (p=0.036 n=5+3)
Throughput/MaxPacket/4MB-4       41.9MB/s ± 6%  356.6MB/s ± 3%  +751.13%  (p=0.036 n=5+3)
Throughput/MaxPacket/8MB-4       42.1MB/s ± 8%  378.6MB/s ± 8%  +798.86%  (p=0.036 n=5+3)
Throughput/MaxPacket/16MB-4      43.8MB/s ± 6%  425.9MB/s ± 1%  +873.11%  (p=0.036 n=5+3)
Throughput/MaxPacket/32MB-4      43.2MB/s ± 9%  417.4MB/s ± 9%  +865.31%  (p=0.036 n=5+3)
Throughput/MaxPacket/64MB-4      43.1MB/s ± 8%  442.2MB/s ± 1%  +925.67%  (p=0.036 n=5+3)
Throughput/DynamicPacket/1MB-4   41.1MB/s ± 3%  246.7MB/s ± 1%  +499.91%  (p=0.036 n=5+3)
Throughput/DynamicPacket/2MB-4   41.3MB/s ±10%  316.2MB/s ± 2%  +666.25%  (p=0.036 n=5+3)
Throughput/DynamicPacket/4MB-4   43.0MB/s ± 5%  366.9MB/s ± 1%  +752.23%  (p=0.036 n=5+3)
Throughput/DynamicPacket/8MB-4   43.4MB/s ± 7%  402.3MB/s ± 1%  +826.56%  (p=0.036 n=5+3)
Throughput/DynamicPacket/16MB-4  44.6MB/s ± 1%  422.0MB/s ± 1%  +846.46%  (p=0.036 n=5+3)
Throughput/DynamicPacket/32MB-4  44.3MB/s ± 2%  439.3MB/s ± 0%  +890.77%  (p=0.036 n=5+3)
Throughput/DynamicPacket/64MB-4  43.0MB/s ±13%  446.2MB/s ± 0%  +938.22%  (p=0.036 n=5+3)

Making the benchmarks unidirectional, which seems to match better a server scenario, halves the cost. While I'm not 100% confident in the benchmarks to be honest, as the CPU profile shows 80-90% time in the syscall package, I'm not sure all deployments would be happy with us introducing a client-initiated 5x resource multiplier by default.

It might be more prudent to add a MinimumRecordSize int Config option, which disables the extension when 0, and allows negotiation when set. (Or just skip the extension and introduce RecordSize int I suppose.) I can implement either, but for the decision I defer to @agl.

925dk commented 7 years ago

This (RFC 6066) has just been implemented in OpenSSL - https://github.com/openssl/openssl/commit/cf72c7579201086cee303eadcb60bd28eff78dd9. Just wanted to flag, if useful for you guys @agl @FiloSottile.

rsc commented 6 years ago

Thoughts, @agl @FiloSottile @tombergan?

ianlancetaylor commented 6 years ago

@FiloSottile Any followup on this?

FiloSottile commented 6 years ago

Based on discussion with @agl, we will support the extension with no config option (but not expose a way to request it). We will treat the performance degradation as a performance issue and try to improve the throughput there.

vertoforce commented 3 years ago

Hey just wanted to poke this thread and say I'd love this feature!

I'm a go dev so I'd be happy to help make it happen with some direction/help :)

maard commented 3 years ago

+1 to this request. Reasoning:

The "standard" ssh implementation in Linux also has the 16k buffer size. HPN-SSH is a series of patches to OpenSSL, which allow TCP window and the app's (ssh) buffer to grow during large transfers.

We (an ISP) need to transfer terabytes daily. Testing with HPN-SSH showed up to 6.5 times speed increase compared to the standard ssh (tcpdump showed TCP window growing to up to 12Mb during such transfers).

Re: fear of DoSing: it's only us who can DoS ourselves in such setup 😄

Without this feature we'd either have to give up on using Go on this project, or only use Layer-2 protected connections (not all of them are).

ianlancetaylor commented 2 years ago

CC @golang/security

Doesn't look like this happened for 1.19. Moving to backlog. Please recategorize as appropriate.

rayozzie commented 11 months ago

The team here at Blues Inc uses (and loves) golang for our cloud service - we're all-in since 2017 - and we use WolfSSL within our device-side firmware (under LwIP+FreeRTOS). FWIW I'd like to reiterate that for multitasked RTOS-based devices with even a moderate amount of memory, it's challenging to ensure that in all cases there's enough free heap to guarantee that 16KB allocations can succeed. As such, RFC6066 support would be tremendously valued so that we can reduce these mallocs to something more readily achievable, such as 8kb. @ianlancetaylor et al, we'd very much appreciate your consideration.

odeke-em commented 10 months ago

Moving to Go1.23 as there wasn't action for Go1.22

ianlancetaylor commented 4 months ago

Nothing is happening on this issue. Moving to backlog. Sorry.