golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.11k stars 17.37k forks source link

proposal: crypto/tls: support kernel-provided TLS #44506

Open howardjohn opened 3 years ago

howardjohn commented 3 years ago

Lots of background and a implementation, albeit from 3+ years ago: https://blog.filippo.io/playing-with-kernel-tls-in-linux-4-13-and-go/

Basically, Linux now supports handling TLS encryption in the kernel. The primary benefit here is the possibility of sendfile/splice to work with TLS. Currently, we need to choose between TLS and splice (or a custom TLS implementation, I suppose).

It would be great to have first class support in go for this.

seankhliao commented 3 years ago

cc @FiloSottile

ShivanshVij commented 3 years ago

I would love to have this happen as well! It's a major use case for L7 load balancers written in golang, and could transparently provide significant performance boosts for a lot of systems (including Kubernetes)

FiloSottile commented 3 years ago

Can we get some benchmarks and numbers for the performance improvement? My patch linked above might be a good starting point. It's a lot of complexity and it would have to be justified by very good numbers.

jim3ma commented 2 years ago

Hi, all

I have updated kernel tls support based on @FiloSottile's original code. It now supports more ciphers like AES_GCM_256, AES_CCM_128 and CHACHA20_POLY1305.

Code: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3.

And I have fixed some kernel issues when in coding: https://github.com/torvalds/linux/commit/974271e5ed45cfe4daddbeb16224a2156918530e, https://github.com/torvalds/linux/commit/d8654f4f9300e5e7cf8d5e7885978541cf61326b

In my simple tests, when enable kernel tls, I have got 30% time cost decreased.

totallyunknown commented 2 years ago

I made some real-world tests with one of our internal applications (CDN node specialised in delivering video segments for DASH and HLS streams).

I compared https vs http, vs http + sendfile and ktls + sendfile.

Most of the TLS stuff is working, except TLS 1.3 with Chrome and k6. k6 reports tls: oversized record received with length 62464.

With ktls, the latency is increased - but this can also be related to the difference in the used Go-Versions.

The ktls implementation reduces overall CPU usage, around 10%. We'll deploy the Nvidia ConnectX-6 (200 Gbit/s) in our latest hardware setup, and we hope we can use the TLS NIC offloading in the future.

https://docs.google.com/spreadsheets/d/1XaiFczae9GLixu__8y2kuKPsw7RGqW9vMDkYxuTLx28/edit#gid=0

jrfastab commented 2 years ago

@totallyunknown If the latency issue is related to the kernel implementation (rule out golang side) we can take a look at kernel side improvements. We've been using the openssl implementation lately so I'll check there as well, but I don't recall extra latency last time I did metrics. Having a golang implementation would be very useful on my side as well. fwiw I'm one of the ktls maintainers on kernel side so we shouldn't have trouble getting improvements there as needed and happy to help where I can to get this moving forward.

jim3ma commented 2 years ago

I made some real-world tests with one of our internal applications (CDN node specialised in delivering video segments for DASH and HLS streams).

  • Kernel 5.13.12
  • Curve: prime256v1

I compared https vs http, vs http + sendfile and ktls + sendfile.

Most of the TLS stuff is working, except TLS 1.3 with Chrome and k6. k6 reports tls: oversized record received with length 62464.

With ktls, the latency is increased - but this can also be related to the difference in the used Go-Versions.

The ktls implementation reduces overall CPU usage, around 10%. We'll deploy the Nvidia ConnectX-6 (200 Gbit/s) in our latest hardware setup, and we hope we can use the TLS NIC offloading in the future.

https://docs.google.com/spreadsheets/d/1XaiFczae9GLixu__8y2kuKPsw7RGqW9vMDkYxuTLx28/edit#gid=0

Which version do you test ? I have update some go code for http with ktls.

totallyunknown commented 2 years ago

@jim3ma Your branch: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3

jim3ma commented 2 years ago

@jim3ma Your branch: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3

Okay, I will merge some optimized code into this branch tomorrow.

kkkygytb commented 2 years ago

Excuse me, how is the implementation going?

kolinfluence commented 1 year ago

hi, this is such a long awaited feature coz crypto tls is so much slower. pls enable this. thx.

VirrageS commented 11 months ago

@jim3ma are there any plans to introduce the changes into the Go code?

jim3ma commented 11 months ago

@jim3ma are there any plans to introduce the changes into the Go code?

Sorry for busy work. I will rebase kTLS code in latest branch and test it again.

zyxkad commented 3 months ago

any updates?

ouvaa commented 3 months ago

@jim3ma curious about the updates too

been checking here https://github.com/0-haha/gnet-tls-go1-20/ and ref: https://github.com/panjf2000/gnet/issues/534

@FiloSottile i've been watching ktls progress for golang since you started the blog in 2021. this is sort of the final huge golang performance benchmark penalty ever.

once this is ktls-ed, i believe will be one of golang's greatest milestone ever.

ShivanshVij commented 3 months ago

I did some rough benchmarks late last year where I had Golang call into rust's TLS library via CGO to do the handshake and then handed off the established TCP connection to Golang.

I found that the performance (throughput/latency on sustained traffic) ended up being about the same as golang's built-in TLS or slightly worse.

I'm not sure why to be honest - maybe I did something wrong? But I would like to see some numbers hopefully from someone else on the actual performance of the kTLS implementation in the linux kernel.

ouvaa commented 3 months ago

@ShivanshVij u hv the code for helping to debug? but ktls is better for sure.

zyxkad commented 3 months ago

I did some rough benchmarks late last year where I had Golang call into rust's TLS library via CGO to do the handshake and then handed off the established TCP connection to Golang.

Based on my understanding, kTLS does not magically works, it's used for zero copy, so you have to send a fd through syscall

ShivanshVij commented 3 months ago

Yep - so the implementation was really straight forward.

Start a TCP listener, wait for a TCP connection to get accepted, and read some N bytes from it and send them to rustls via CGO. If we needed more bytes the rustls library would signal that, otherwise it would give us some bytes to write back to the connection - which we would do in Go by blindly writing the byte slice into the net.Conn.

Once the handshake was complete, we'd pull out the required kTLS secrets from the handshake in rustls, and then do the required syscalls in Go to tell the kernel that the fd that backed the net.Conn is a kTLS fd.

After that, future reads/writes on the net.Conn would result in proper TLS encryption/decryption without any userspace overhead.

kanocz commented 3 months ago

One more thing - many better network card have crypto-acceleration and this can be accessed by ktls API, so supporting ktls in golang we are able to offload encryption to network card so please don't compare only software encryption in golang vs software encryption in kernel - it's not so relevant for many production environments

kolinfluence commented 2 months ago

i've used gnet's ktls and other ktls version but i found that if going through cgo, and with multiple goroutines, it seems to crash. e.g. 1000000 goroutines calling cgo seemed not possible. u can probably do 80k max. so not sure if ktls will be available to do so or if this will be an issue crashing if doing cgo syscall etc.

rsc commented 2 months ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rhysh commented 2 months ago

It looks like rustls (for Rust) makes kTLS possible by allowing access to the key material after the handshake completes. Could that be the right stance for Go as well, to allow use of kTLS without the crypto/tls maintainers needing to take on ownership of all of the moving parts?

The QUIC support in crypto/tls is in a similar position, where crypto/tls does the initial handshake and then hands the key material over to its caller.

From what I can tell, the discussion at https://github.com/rustls/rustls/issues/198 led to https://docs.rs/rustls/latest/rustls/struct.ExtractedSecrets.html, which in turn enables users to provide their own kTLS wiring. (There's support in crypto/tls already for a Config.KeyLogWriter — but as the rustls maintainers also discovered, that format doesn't include all of the information that the kernel needs to continue the symmetric encryption.)

totallyunknown commented 1 month ago

We aim to achieve 400Gbit/s network throughput serving HTTP with Go but are currently constrained by memory bandwidth. With AMD Rome generation, we can reach 165Gbit/s of network traffic, with memory bandwidth fully utilized, as shown by AMDuProf. To overcome this, we need zero-copy techniques like sendfile, which requires kTLS support in Go, eliminating the memory bandwidth constraint.

rsc commented 4 weeks ago

@rolandshoemaker and @FiloSottile to work out an API. It sounds like we should work on an API where Go keeps the handshake and then hands off the key so the kernel can do the record layer.

rsc commented 3 weeks ago

@FiloSottile and I discussed this, and we wonder if this can be done without any new secret-sharing API at all: if kTLS is good enough, then Go should arrange to use it by default, right? We'd probably also need to add ReadFrom and WriteTo methods to the tls.Conn implementations so that io.Copy goes straight to sendfile, but no new TLS-related API would be needed.

Is there a flaw in this thinking?

Are there Go or Rust kTLS implementations already that are worth looking at to understand the kernel interaction details? We spent a while reading linux/tls.h but it's not terribly well documented.

And are there other operating systems with kTLS that we should look at?

4xoc commented 3 weeks ago

I believe it would the be the right thing to get kTLS going as a default on supported systems. Having a secret sharing API might be useful for some developers though, maybe something one can meddle with explicitly.

Maybe this helps with the kernel interaction.

Looking at FreeBSD would probably be a good idea. The implementation seems quite mature.

astrolox commented 3 weeks ago

Nginx has had support since around 2021. Although I think it just delegates the hard work to OpenSSL. Still might be worth a look here; https://hg.nginx.org/nginx/rev/65946a191197

rsc commented 3 weeks ago

I believe it would the be the right thing to get kTLS going as a default on supported systems. Having a secret sharing API might be useful for some developers though, maybe something one can meddle with explicitly.

Meddlers can always use reflect and unsafe. No need to add API for them.

sprappcom commented 2 weeks ago

here, some of the unverified and broken ktls on my radar: https://github.com/0-haha/gnet-tls-go1-20/blob/dev/ktls_linux.go https://github.com/soluble-ai/go-ktls/blob/master/ktls.go

when's the eta for this? been looking at this thread since 2021. :D

@totallyunknown 's doing 165GBits/s on a 400Gbits/s line is really weak. i'm hoping for the performance too.

@rsc possible for meddlers to live with one without the alloc/op too? that'll be heaven.

talk about zero alloc/op... i really wish arena feature is fully supported as non-experimental.

harshavardhana commented 2 weeks ago

https://github.com/soluble-ai/go-ktls/blob/master/ktls.go

This is not kernel TLS. It looks like some TLS secret as Kubernetes secret

rolandshoemaker commented 2 weeks ago

Possibly on the roadmap for 1.24 if we have the time.

totallyunknown commented 2 weeks ago

@totallyunknown 's doing 165GBits/s on a 400Gbits/s line is really weak. i'm hoping for the performance too.

@sprappcom 400G is the future goal. 165 Gbit/s is with 2x100G (NVIDIA Connect-X6 + AMD Rome).

sprappcom commented 2 weeks ago

@totallyunknown ok. 82.5% is impressive. mine can only do 60% on laptop

rsc commented 2 weeks ago

It sounds like people are on board for "no new API", implementation on by default once it works, with a GODEBUG like tlskernel=0 to turn off.

Do I have that right?

howardjohn commented 2 weeks ago

I have a few concerns about on-by-default:

There is a meaningful difference in data being written to the kernel in plaintext vs encrypted, from a debugging, tooling, and even security POV. (I would not claiming there are legitimate threat vectors here, but some people are quite paranoid, and I am not an expert -- so I suspect others might).

kTLS may work on a wide-ish range of Linux versions which we can check against, but it doesn't necessarily work well on all of them. https://people.kernel.org/kuba/tls-1-3-rx-improvements-in-linux-5-20 for instance shows there are some very recent critical performance improvements. This makes it tricky to know what is the right bar to implicitly turn this on. Is it the oldest Linux version that supports the features indicated? The oldest one we measured as "fast enough"? What if changes to Go or Linux change whether it is "fast enough", or different kernel or hardware configurations do? For instance, kTLS may be "not fast enough" on Linux 5.0 in general, but I may have a NIC that supports kTLS offload making it suitable even on that version.

I won't claim to be a TLS expert, it just feels like there are a tremendous number of variables to consider. I could see maybe in many years there is enough clarity on real world use cases, production testing, etc that we get to the point where we could turn it on by default. However, I don't think that would happen for many years likely, and even when it did I would think it is controversial enough to warrant first class configuration rather than just a GODEBUG.

FWIW, around a year ago I did some performance testing of Go+kTLS with some fork I found (sorry, I forget a lot of details at this point). The performance was pretty rough, and, surprisingly, even worse when using splice which is what should be the big win. I wouldn't put too much weight on that given the vague claims + age + unofficial implementation, but something to keep an eye on.

Jorropo commented 1 week ago

@howardjohn about the point of which is the first good enough version. I don't see why we shouldn't default to using it with a linux version where linux's software implementation is good enough to match / beat go's one because at worst performance is similar but you can now sendfile and friends. 5.0 being slow is not a good reason to arbitrary slow down 5.20 (or whatever the threshold is). We can make a ternary option with GODEBUG=tlskernel=always for the users on 5.0 that happen to have compatible hardware.

We already conditionally use linux features based on the kernel's version in the codebase: https://github.com/golang/go/blob/fed2c11d67dbe6d8179cd411b4bb7761d034e9d2/src/internal/poll/copy_file_range_linux.go#L18

These old benchmarks showed worst latency even when throughput is significantly higher. However this is already a problem in the TLS protocol due to AEAD framing tradeoffs and here is our strategy against it: https://github.com/golang/go/blob/52ce25b44e8c21f62e95b12497db3036c5bd27c3/src/crypto/tls/conn.go#L879-L894 If this happens again because handshakes will still happen in userland go we don't need to transition right away to KTLS, we can handle the first 1MB (or Xbytes) in userland to keep a healthy time-to-first-byte.

FiloSottile commented 1 week ago

The fact that deciding when to enable kTLS is hard is not a reason to delegate the choice to the user, but the opposite: it means we should do the research, measure the performance, weight the tradeoffs, and make the judgement calls, so our users don't have to. We're in the business of building a TLS stack, our users are in the business of writing Go programs.

More concretely, yes, I think we should figure out which Linux version has "good enough" kTLS, and require that. Users that want kTLS can upgrade their kernel version, and as time passes that will be less and less of a problem. I am not too concerned about weird combinations of old kernels and powerful NICs, if they are common, we'll hear about it and reassess.

sprappcom commented 1 week ago

ktls are for extreme users, just go extreme in this order (seriously, i doubt there will be more than 100 proj adopting this coz i'm looking at this thread since 2021)

  1. "future proof" linux compat (coz i'm into linux and the userbase for this is the largest and of course most important)
  2. security (i thought it's just api so no need to worry that much about this ktls security coz linux side would have taken care of it. just allow us to upgrade to the latest kernel will do)

get it out already by now, pls at go 1.24. make it an experimental feature at least. thx in advance. this is really tough to get right i understand.

@rsc @FiloSottile GOEXPERIMENT=ktls

those who want edge cases can fork it and do the package they want on their own os.

rsc commented 1 week ago

Let's keep the GODEBUG name starting with tls like all the other tls names, so tlskernel=1 not ktls=1. GODEBUG not GOEXPERIMENT because it is a runtime decision.

It sounds like we all agree not to add new API other than the GODEBUG. With the proposal being just to add the GODEBUG and work on an implementation, I think this is moving towards acceptance. Do I have that right?

rsc commented 1 week ago

Have all remaining concerns about this proposal been addressed?

The proposal is to develop transparent kTLS support behind GODEBUG=tlskernel=1. A future proposal can discuss the conditions under which it should be enabled by default.