containers / image

Work with containers' images
Apache License 2.0
862 stars 377 forks source link

http.Transport configuration enforces http1.1 and causes issues with istio #1139

Open jhart1685 opened 3 years ago

jhart1685 commented 3 years ago

This code https://github.com/containers/image/blob/master/pkg/tlsclientconfig/tlsclientconfig.go#L100-L105

Creates an http transport with a custom dialler, which means that by default, Go will not allow the connection to be upgraded to http2 (this is explained in the documentation of the ForceAttemptHTTP2 option).

This, coupled with the DisableKeepAlives: true option, means that the resulting connection is susceptible to being chopped quickly by the server. See https://github.com/istio/istio/issues/30577 for background of this.. But essentially, if you are doing (for example) a skopeo copy of an image between two registries, and the source registry is hosted behind an istio ingress, the image read is likely to get terminated early with an EOF.

I've seen other issues in this repo around EOFs, improving diagnostics and logging of them for example. But the frequency of these EOFs can be much reduced by implementing one of the following options:

vrothberg commented 3 years ago

Thanks for reaching out and the great summary, @jhart1685!

I browsed a bit around and had a look. Docker is also disabling the keep alives while containerd does not. I need to look a bit closer to check whether changing that in c/image would have other implications.

@mtrmac @rhatdan WDYT?

mtrmac commented 3 years ago

I’ll try to check in more detail later — just as a very general point, the pkg/tlsclientconfig was created for compatibility, but we mostly have neither the interest nor the expertise to fine-tune TCP and similar low-level layers, and especially to keep up with the changing best practices.

We probably can’t just blindly drop everything in there; overall, it does seem attractive to decrease the deviation from defaults when we don’t have a specific reason to keep deviating.

jhart1685 commented 3 years ago

Hi @vrothberg @mtrmac - Thanks for looking. I take your point about fine-tuning TCP not being the core function of this project. Likewise, it's not my area of expertise either.

But in this case, we've tracked down one of the causes of the EOFs that make Skopeo unusable in certain situations. It looks like a relatively small modification here (albeit one still to be made carefully and with due diligence wrt testing etc.) could make the client much more tolerant of different server configurations, and provide a big win in terms of reliability.

mtrmac commented 3 years ago

Oh I’m absolutely not saying that we shouldn’t change the code in that package ever; I mean that it would be nice to avoid making such choices if we are not in a position to maintain them well, i.e. tend to remove code from there if possible / reasonable.

mtrmac commented 3 years ago

This code https://github.com/containers/image/blob/master/pkg/tlsclientconfig/tlsclientconfig.go#L100-L105

Creates an http transport with a custom dialler, which means that by default, Go will not allow the connection to be upgraded to http2 (this is explained in the documentation of the ForceAttemptHTTP2 option).

Just for the record, that would happen anyway because we always set TLSClientConfig. So we would have to affirmatively opt into HTTP/2. (And the http package documentation doesn’t say why TLSClientConfig disables HTTP/2…)

This, coupled with the DisableKeepAlives: true option, means that the resulting connection is susceptible to being chopped quickly by the server. See istio/istio#30577 for background of this..

… and reading https://github.com/istio/istio/issues/30577#issuecomment-772120680 / https://github.com/istio/istio/issues/30577#issuecomment-772705815 , we are talking about an unreliable workaround to a heuristic, not(?) some fundamental property of TCP.


It’s seriously unclear to me why Docker and containerd did / did not make the choices they did. The decision to disable keepalives seems to come from https://github.com/moby/moby/pull/797/commits/cff3b37a61b8da883437eec799b7b852d22538f0 , which does not document any rationale, at best the containing PR vaguely suggests a direction.

mtrmac commented 3 years ago

Linking for context: #1083 .

jhart1685 commented 3 years ago

Just for the record, that would happen anyway because we always set TLSClientConfig. So we would have to affirmatively opt into HTTP/2. (And the http package documentation doesn’t say why TLSClientConfig disables HTTP/2…)

Hmm - yes, could be that my simplified recreate moved too far away from this code and introduced that different dimension.

… and reading istio/istio#30577 (comment) / istio/istio#30577 (comment) , we are talking about an unreliable workaround to a heuristic, not(?) some fundamental property of TCP.

Yes. I think the server-side configuration to mitigate this is an unreliable workaround. The client modifications being discussed here are more like defensive coding, to harden the code against a variety of server types and configurations.

mtrmac commented 1 year ago

As of #1867, nothing in c/image disables HTTP keep-alive; so, per the initial motivation,

But the frequency of these EOFs can be much reduced by implementing one of the following options:

that should now work better.


HTTP/2 is still effectively opted out because we set DialContext.