rustls upgrade in 2.10 is a breaking change?

algesten / ureq

A simple, safe HTTP client

Apache License 2.0

1.67k stars 172 forks source link

rustls upgrade in 2.10 is a breaking change? #765

Open jerrinot opened 2 months ago

jerrinot commented 2 months ago

Hello,

I believe ureq 2.10 introduces a breaking change that should be reflected in the library's versioning. Why do I think this is a breaking change?

ureq exposes Rustls ClientConfig in its API, making Rustls's public API a part of ureq's public API.
ureq 2.10 updated Rustls from 0.22 to 0.23.
According to cargo semver, the update from Rustls 0.22 to 0.23 is interpreted as a breaking change.

Example of the breakage:

We maintain a library that depends on both ureq and rustls.
Our library uses Rustls both directly and through ureq.
In our Cargo.toml, we specify ureq = "2.9" and rustls = "0.22".
Our library is published on crates.io.

After the release of ureq 2.10, our library users automatically receive ureq 2.10 as it is considered compatible with 2.9. As a result, our library users end up with two different versions of rustls:

Version 0.23 as a transitive dependency of ureq,
Version 0.22 as declared in our own Cargo.toml.

This leads to a scenario where the library instantiates ClientConfig from rustls 0.22 and attempts to pass it to AgentBuilder.tls_config(), which expects a client config from rustls 0.23, resulting in compatibility issues.

     |                                                       ---------- ^^^^^^^^^^ expected `rustls::client::client_conn::ClientConfig`, found `ClientConfig`
     |                                                       |
     |                                                       arguments to this method are incorrect
     |
     = note: `ClientConfig` and `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types

I am not very well versed in Rust ecosystem, but I believe that if you expose a 3rd party library in your API and this library declares a breaking change then your library should declare it too.

algesten commented 2 months ago

Yeah. I don't know what to do there. Re-exporting rustls like this has proven to be a bad idea. I guess re-exporting anything which is semver 0.x.x is not good, since 0.x means patch versions are allowed to be breaking.

I really don't want to bump major version ureq for this, since I believe many users will not be affected in the same way as yourself.

I can only apologise and promise to do better in ureq 3.x, which I'm working actively on right now (https://github.com/algesten/ureq/pull/762). I will add to the list for 3.x to be super conservative regarding re-exporting dependencies.

I am sorry!

jerrinot commented 2 months ago

hello @algesten,

thanks for a super quick reply. No need to apologize - you owe me nothing! I am looking forward for ureq 3.x!

FWIW, I suspect that exposing a third-party library API within your own API is always a bit too close to running with scissors, as it means you are not fully in control of your library's API surface. However, perhaps that's just me, coming from the Java world and being overly cautious or conservative 🤷 👴

algesten commented 2 months ago

thanks for a super quick reply. No need to apologize - you owe me nothing! I am looking forward for ureq 3.x!

Thanks!

FWIW, I suspect that exposing a third-party library API within your own API is always a bit too close to running with scissors, as it means you are not fully in control of your library's API surface. However, perhaps that's just me, coming from the Java world and being overly cautious or conservative 🤷 👴

Yeah, it's a bit rock and hard place, because I really don't want to build API and abstractions for all the ins-and-outs of TLS. rustls, native-tls heck even openssl are all 0.x. I seen some tls-api crate but not sure there is a "certain" well supported way forward there.

Narsil commented 2 months ago

It seems to break something else without this specific configuration.

https://github.com/huggingface/text-generation-inference/actions/runs/9856277290/job/27212925674#step:13:453 no process-level CryptoProvider available -- call CryptoProvider::install_default() before this point

I haven't understood yet what exactly is causing the issue, but I don't link it's due to dependency.

But keepign ureq==2.9 doesn't trigger the issue. Has something changed in the default crypto providers ? (The linked repo depends on hf-hub which depends on tokio/reqwest and ureq at the same time (For sync/async support)

Thanks a lot for this lib ! I put this comment here mostly so that others might find it through google.

xangelix commented 2 months ago

It seems to break something else without this specific configuration.

https://github.com/huggingface/text-generation-inference/actions/runs/9856277290/job/27212925674#step:13:453 no process-level CryptoProvider available -- call CryptoProvider::install_default() before this point

I haven't understood yet what exactly is causing the issue, but I don't link it's due to dependency.

But keepign ureq==2.9 doesn't trigger the issue. Has something changed in the default crypto providers ? (The linked repo depends on hf-hub which depends on tokio/reqwest and ureq at the same time (For sync/async support)

Thanks a lot for this lib ! I put this comment here mostly so that others might find it through google.

same issue with the ehttp crate, including the once_cell poisoned error linked

jerrinot commented 2 months ago

@Narsil @xangelix the issue you observe is very likely a consequence of updating rustls to 0.23.

See the Rustls Release Notes:

Support for process-wide selection of CryptoProviders. See the documentation. Note that callers of ClientConfig::builder(), ServerConfig::builder(), WebPkiServerVerifier::builder() and WebPkiClientVerifier::builder() must now ensure that the crate's features are unambiguous or explicitly select a process-level provider using CryptoProvider::install_default(). Otherwise, these calls will panic with:

no process-level CryptoProvider available -- call CryptoProvider::install_default() before this point

There is a related discussion at https://github.com/rustls/rustls/issues/1938

Narsil commented 2 months ago

@jerrinot Thanks for the link I saw that issue, but it seems slightly odd that every binary now has to change the code to choose a rustls implementation for the provider.

If that's the case then the semver breaking might be a bit more serious, no ?

algesten commented 2 months ago

Thanks for the link I saw that issue, but it seems slightly odd that every binary now has to change the code to choose a rustls implementation for the provider.

Open to suggestions I can do that would improve the situation. ureq PR that bumped the dep was https://github.com/algesten/ureq/pull/753 I was following the recommendation by the rustls people linked in https://github.com/rustls/rustls/issues/1913

I assumed ureq users would like to keep on top of the latest rustls changes, but I guess we could downgrade the dep to 0.22 to not have ureq be the first lib that require people to tackle the whole ring vs awc-lc-rs deal.

algesten commented 2 months ago

Oh and I documented my feelings on the subject here: https://github.com/algesten/ureq/issues/751#issuecomment-2072865041

cpu commented 2 months ago

Hi :wave:

I'm interested in trying to help here, but also find myself entering the thread with trepidation. I don't want to drag heated discussion to your repo by trying to weigh in, but I also want to support folks figuring out what will work best for their applications. Just wanted to put that up front as a vulnerable disclaimer from a human being :-)

it seems slightly odd that every binary now has to change the code to choose a rustls implementation for the provider.

There's a bit more nuance here, as mentioned in the docs you quoted (emphasis added by me):

Note that callers of ClientConfig::builder(), ServerConfig::builder(), WebPkiServerVerifier::builder() and WebPkiClientVerifier::builder() must now ensure that the crate's features are unambiguous or explicitly select a process-level provider using CryptoProvider::install_default().

If the feature configuration of your project is such that only one of rustls/ring or rustls/aws-lc-rs (the default) are enabled then the interfaces that work on the basis of there being a default provider will operate without code change. You only need to explicitly register a process-wide default if you're using the interfaces that require a default and the features are ambiguous.

Oh and I documented my feelings on the subject here: https://github.com/algesten/ureq/issues/751#issuecomment-2072865041

I think it would be helpful to separate the challenges of the crypto provider interface (mainly, the nuance of the process-wide default) from the choice to make aws-lc-rs the default provider. My thinking here is that even if ring were the default the situation of an ambiguous process-wide default in the presence of crates enabling both ring and aws-lc-rs would remain. (I'd really like to avoid rehashing 751 over here so it's helpful in my mind if that choice isn't on topic).

Open to suggestions I can do that would improve the situation.

I think there's two things we could think about:

I think the changelog update on the ureq side could include more detail about this change in model by Rustls. I'm specifically thinking about lifting some text/guidance from the rustls docs on the process default provider. Would you be open to me drafting a PR suggesting some text to that could be added there?
I think as noted by the OP of this thread, the major Rustls version increase would have been better done in a major ureq version increase (since the Rustls API is re-exported through ureq). That ship might have sailed, but in the future I think this would help set expectations for downstream projects. Edit: also wanted to add there are options we could hash out to avoid the re-export, I think there are a few projects to draw inspiration/prior-art from.

algesten commented 2 months ago

I'm interested in trying to help here, but also find myself entering the thread with trepidation. I don't want to drag heated discussion to your repo by trying to weigh in, but I also want to support folks figuring out what will work best for their applications. Just wanted to put that up front as a vulnerable disclaimer from a human being :-)

Yeah. Sorry. I am too quick to jump into my feelings. Let's focus on making it better :)

If the feature configuration of your project is such that only one of rustls/ring or rustls/aws-lc-rs (the default) are enabled then the interfaces that work on the basis of there being a default provider will operate without code change. You only need to explicitly register a process-wide default if you're using the interfaces that require a default and the features are ambiguous.

Does that mean it would be better if ureq did not default to ring, to not confuse the default choice made by the user?

Oh and I documented my feelings on the subject here: #751 (comment)

I think it would be helpful to separate the challenges of the crypto provider interface (mainly, the nuance of the process-wide default) from the choice to make aws-lc-rs the default provider.

Yeah. Sorry. I meant to give context, which would be better to do without emotion.

My thinking here is that even if ring were the default the situation of an ambiguous process-wide default in the presence of crates enabling both ring and aws-lc-rs would remain. (I'd really like to avoid rehashing 751 over here so it's helpful in my mind if that choice isn't on topic).

This is similar to a situation we had in ureq when deciding how to handle native-tls. Because ureq can be used in multiple libraries, we wanted to avoid the confusion that can arise by diamond dependencies. We settled on rustls being the primary default, and it remains the default even if the user enables native-tls. You can only get native-tls by enabling it in a manually configured Agent

Following from that logic, is there a way we can force rustls to use ring even when the user changes the default, and make the aws-lc variant of rustls similar to native-tls, i.e. it must be explicitly configured in Agent?

I understand that would proclude the user from configuration to make their entire app FIPS-compliant, but I suspect that set of users is much smaller than the set of users being affected by this situation.

Open to suggestions I can do that would improve the situation.

I think there's two things we could think about:

I think the changelog update on the ureq side could include more detail about this change in model by Rustls. I'm specifically thinking about lifting some text/guidance from the rustls docs on the process default provider. Would you be open to me drafting a PR suggesting some text to that could be added there?

Yeah a PR for this would be great. But maybe there is some fix along the lines I outlined above I should do first?

I think as noted by the OP of this thread, the major Rustls version increase would have been better done in a major ureq version increase (since the Rustls API is re-exported through ureq). That ship might have sailed, but in the future I think this would help set expectations for downstream projects. Edit: also wanted to add there are options we could hash out to avoid the re-export, I think there are a few projects to draw inspiration/prior-art from.

That's great. I'm currently building our ureq 3.x and what I learned here is to reduce this re-exporting business, especially of 0.x crates. I'd be very grateful for any pointers to prior art.

cpu commented 2 months ago

Yeah. Sorry. I am too quick to jump into my feelings. Let's focus on making it better :)

To be explicit I've found your posts to be very reasonable, it's just a tough subject and not everyone is engaging in the most constructive manner. The energy pools for that kind of topic are quickly drained (as I'm sure you can relate!).

Does that mean it would be better if ureq did not default to ring, to not confuse the default choice made by the user?

I think our guidance for libraries like ureq is to take rustls as a dependency with no default features enabled, and then let consumers either take their own direct dep on rustls to activate a specific feature flag for a backend, or to have ureq expose its own optional features that enable the relevant rustls features. That's the model hyper-rustls uses as one example.

If you're using the builder methods that assume the crypto provider default that puts all of the control into the downstream project's hands. I'm not sure if this is better across all dimensions, but one option.

is there a way we can force rustls to use ring even when the user changes the default, and make the aws-lc variant of rustls similar to native-tls, i.e. it must be explicitly configured in Agent?

Let me refresh my understanding of the ureq API and how it interacts with the rustls API and get back to you with a better answer. I think it might be workable, but it may depend on how much the crate relies on users passing in their own rustls client configurations vs them being constructed in ureq. It might be a situation where trying a draft PR and seeing where the rough edges are is easier than thinking it through on first principles.

I understand that would proclude the user from configuration to make their entire app FIPS-compliant, but I suspect that set of users is much smaller than the set of users being affected by this situation.

That's fair, I think it's reasonable to prioritize different things here.

That's great. I'm currently building our ureq 3.x and what I learned here is to reduce this re-exporting business, especially of 0.x crates. I'd be very grateful for any pointers to prior art.

Sounds good. I think both tonic and hyper have found a way around this issue but I don't have the specifics in hand.

I appreciate the conversation :-) Sorry for a half-reply promising better answers, it's the end of my day and I didn't want to leave this hanging.

algesten commented 2 months ago

it's the end of my day and I didn't want to leave this hanging

hey. i'm on holiday and knee deep in a gin & tonic. I'm not very productive ;)

cpu commented 2 months ago

I did some more thinking on this. Here's some information to work with. As expected there are a handful of trade-offs to consider and no one solution that works for everyone (c'est la vie :face_exhaling: ).

Following from that logic, is there a way we can force rustls to use ring even when the user changes the default, and make the aws-lc variant of rustls similar to native-tls, i.e. it must be explicitly configured in Agent?

I think the crate::default_tls_config() fn that's applied to the AgentBuilder when the tls feature is enabled could be written to force the use of *ring* with the current Cargo.toml config of ureq if that's what you thought was best for your user base. I think doing so would remove a bit of friction w.r.t the crypto provider process default.

The rtls.rs default_tls_config() fn would need to change from using ClientConfig::builder() (which assumes a process default) to instead use ClientConfig::builder_with_provider() passing in rustls::crypto::ring::default_provider() (which you know is available, because the ring feature is on). That would explicitly use *ring*, no matter the process default (including if one isn't set, avoiding the issue some users are reporting above). You'll have to set a couple more builder fields as well (notably protocol version) but could crib the defaults being used by builder().

Users that are passing in their own Arc<rustls::ClientConfig> with AgentBuilder::tls_config(..) still have to take some care in this arrangement and would need to either:

Also explicitly use ClientConfig::builder_with_provider() when building their config or:
Use ClientConfig::builder() but contend with the process default, which means:
- Adding a call to CryptoProvider::install_default() prior to any config construction, or:
- making sure nothing in their project's deps activate the rustls aws-lc-rs feature, because in combination with ureq activating the ring feature unconditionally, it makes the feature set ambiguous.

Hopefully users providing their own customized config are a bit more "plugged in" to the Rustls API changes over breaking releases and so could contend with the above better than the folks that are relying on default interfaces. Does that make sense?

The main potential issues I see across the board stem from ureq activating the rustls ring feature with its tls feature. That choice takes away a downstream project's ability to build without ring (though I imagine LTO would remove it if not used?), or to use the default crypto provider mechanism without explicitly installing a process default if they want to use aws-lc-rs (because with both features activated a call to CryptoProvider::install_default() becomes required). Perhaps these things aren't deal-breakers in this context? You can certainly still use aws_lc_rs with ureq by taking your own rustls dep, activating that feature, and using AgentBuilder::tls_config to provide your own config.

Countering those issues, I do see an upside here that ureq activating ring and updating crate::default_tls_config() to use it explicitly via ClientConfig::builder_with_provider() means that a user can activate the tls feature, use the default config, and not have to make any choices about the nitty gritty of the crypto providers or add any calls to install one. I can see that being an advantage for some users for sure.

Sorry for the essay length reply :laughing: I wanted to try and lay out all the pros/cons as I see them as impartially as possible. WDYT?

cpu commented 2 months ago

I think the changelog update on the ureq side could include more detail about this change in model by Rustls. I'm specifically thinking about lifting some text/guidance from the rustls docs on the process default provider. Would you be open to me drafting a PR suggesting some text to that could be added there?

Yeah a PR for this would be great.

Done: https://github.com/algesten/ureq/pull/767

I think the crate::default_tls_config() fn that's applied to the AgentBuilder when the tls feature is enabled could be written to force the use of ring with the current Cargo.toml config

Rolled this into #767 for consideration.

laniakea64 commented 1 month ago

I think the crate::default_tls_config() fn that's applied to the AgentBuilder when the tls feature is enabled could be written to force the use of ring with the current Cargo.toml config

Rolled this into #767 for consideration.

Hi, ran across this while playing with reqwest and ureq for a couple local Rust projects. These local programs use the OS cert store via feature flags (ureq feature native-certs). Being on Linux and not negatively affected by aws-lc-rs build requirements, thought it maybe better if my programs follow rustls devs' recommendation https://github.com/rustls/rustls/issues/1913#issuecomment-2071760244 .

IIUC with ureq 2.10.0 this was straightforward, all that was needed was to depend on rustls with only default features and add one line of code like cargo-upgrades did: https://gitlab.com/kornelski/cargo-upgrades/-/commit/d890d43432610221d593216a68f0c14d7df4ccd7#b24749917179fb5e3e613ed2a703fcdcc6cdf9da_53_58

However, with ureq 2.10.1 it seems the only way to use aws-lc-rs is to basically duplicate a bunch of ureq internal code?

Would it be possible for ureq to default this to the application-set default rustls provider/backend if set, and only fall back to ring when that's not set? I'm not very experienced in Rust, but looks to me like this is exactly what reqwest does? - https://github.com/seanmonstar/reqwest/blob/5715050256882dc9daf90403c1c2d58e7cd78eaf/src/async_impl/client.rs#L565-L579

Or is there a reason why such approach wouldn't work in ureq?

Thanks - and many thanks for ureq!

algesten commented 1 month ago

Maybe it wasn't clear by the above discussion.

ureq first priority is will just work out of the box, and the aws-lc-rs backend does not just work, it requires additional dependencies on windows.
secondary priority is that ureq should be a pure rust solution. rustls itself is great, and I hope there will be a pure rust backend for it in the future.

Making ureq ship every possible TLS backend (or variation of backend) is not a goal (the less such variations we have to maintain, the better). For now there are two TLS variants

"pure(ish) rust", work out of the box, rustls+ring
native-tls for those that must use a platform supplied TLS

ureq 3.x is making the transports (and TLS wrapping) easier to plug in. I'm hoping this will alleviate the problems people are experiencing.

Or is there a reason why such approach wouldn't work in ureq?

I'm sure all sorts of solutions could work, but I'm uneasy about making more big changes in the 2.x tree for feature flags, fallbacks, etc since I had enough of accidental breaking changes in non-major versions.

If you want to contribute a better solve than defaulting to ring, you can open a PR in the 3.x branch. I'm about to promote that branch to a public beta on crates.io.

algesten commented 1 month ago

Locking this thread since I'd want discussions about potential other solves to be in new issues/PR.