algesten / ureq

A simple, safe HTTP client
Apache License 2.0
1.71k stars 176 forks source link

Feedback on attempting to port to 3.0.0-rc2 #870

Open sosthene-nitrokey opened 6 days ago

sosthene-nitrokey commented 6 days ago

Hi,

We are using ureq in our PKCS#11 module, thank you for this reliable and simple library.

We recently had to fork ureq in our use to implement missing functionality we wanted to have, mainly TCP keepalive and automatically discarding old connections.

We hope the 3.x.x branch allowing separate "connectors" and "transport" and the fact that it already supports a max age for idle connections would permit us to migrate back to upstream.

I attempted to port our implementation to the 3.0.0-rc2, here are the challenges I encountered:

Config fields are private

This prevents implementing some of the functionality of the upstream TcpConnector in our own implementation (for example there is no way to read the nodelay configuration of the config, preventing of implementing the same functionality in other crates. Looking at the main branch, it looks like this has been fixed, so this may be outdated.

There would be benefits to not boxing the connectors

In the connector traits, the transports are boxed, preventing chained connectors to make use of specificities of the previous transport. Ideally for example, it would be possible to use the specific type of the previous transport. For example a TCP keepalive connector could just wrap or come after a TcpTransport, get its TCP stream, and enable Keepalive on it.

The way I would do it would use associated types for the transports, and make chained and not chained connectors part of the trait.

pub trait Connector: Debug + Send + Sync + 'static {
    type Transport: Transport;

    fn boxed(self) -> AnyConnector
    where
        Self: Sized,
    {
        Box::new(self)
    }
    fn connect(&self, details: &ConnectionDetails) -> Result<Self::Transport, Error>;
}

pub trait AnyConnectorTrait: Debug + Send + Sync + 'static {
    fn connect(&self, details: &ConnectionDetails) -> Result<Box<dyn Transport>, Error>;
}

impl<C: Connector> AnyConnectorTrait for C {
    fn connect(&self, details: &ConnectionDetails) -> Result<Box<dyn Transport>, Error> {
        Ok(Box::new(self.connect(details)?))
    }
}

pub type AnyConnector = Box<dyn AnyConnectorTrait>;

impl Connector for Box<dyn AnyConnectorTrait> {
    type Transport = Box<dyn Transport>;
    fn connect(&self, details: &ConnectionDetails) -> Result<Self::Transport, Error> {
        (**self).connect(details)
    }
}

pub type AnyTransport = Box<dyn Transport>;

pub trait MiddleConnector<T: Transport>: Debug + Send + Sync + 'static {
    type Transport: Transport;

    fn connect(&self, details: &ConnectionDetails, chained: T) -> Result<Self::Transport, Error>;
}

With an implementation on tuples of many sizes through a macro:

impl<C, M1, M2> Connector for (C, M1, M2)
where
    C: Connector,
    M1: MiddleConnector<C::Transport>,
    M2: MiddleConnector<M1::Transport>,
{
    type Transport = M2::Transport;

    fn connect(&self, details: &ConnectionDetails) -> Result<Self::Transport, Error> {
        self.2
            .connect(details, self.1.connect(details, self.0.connect(details)?)?)
    }
}

You could still keep the ChainedConnector for one Connector + a chain of MiddleConnector<AnyTransport> that could also be boxed.

For connectors that may or may not "chain" like the SocksConnector followed by the TcpConnector, I would instead implement the SocksConnector as a:

struct SocksConnector<C: Connector<Transport = TcpStream>> {
    inner: C,
}

impl<C: Connector<Transport = TcpStream>> Connector for SocksConnector<C> { 
    type Transport = TcpStream;
    fn connect(&self, details: &ConnectionDetails) -> Result<Self::Transport, Error> {
        if !details.use_socks {
            return self.inner.connect(details);
        }
        // Current socks connector logic
    }
}

This type safety would allow for example a keepalive connector, which is much simpler that re-implementing all the functionality of the TcpConnector just to add that. It makes connectors much more composable.

struct KeepaliveConnector;

impl MiddleConnector<TcpTransport> for KeepaliveConnector {
    type Transport = TcpTransport;
    fn connect(
        &self,
        details: &ConnectionDetails,
        chained: TcpTransport,
    ) -> Result<Self::Transport, Error> {
        let stream: &std::net::TcpStream = chained.tcp_stream();
        // Configure Keepalive on the TcpStream;
        Ok(chained)
    }
}

The upstream connectors and transports are not public

This goes with the previous point. Building a custom chain of connectors could be simpler if the upstream connectors were public. I understand that doing so would be a semver hazard, so maybe they should be in a separate crate (ureq-utils maybe?), to allow building on top of them, but allow the connectors to be more frequently updated.

The typestate variants are not public

The types WithoutBody and WithBody are not public (same for the Config scopes), which means it's not possible to write function that expect or return a RequestBuilder<WithoutBody>.

If you do not want these types to be public for semver reasons, would it be possible to still expose a type alias: type RequestWithBody = RequestBuilder<WithBody>,and type RequestWithoutBody = RequestBuilder<WithoutBody>, so that they be used in functions signatures?

Thank you for the work on 3.0.0, this looks like it will certainly help ureq be more flexible!

If you want I can create dedicated issues (or even PRs) for the parts you think are worth it.

algesten commented 6 days ago

@sosthene-nitrokey Thank you so much for taking the time to test this thoroughly. This is such great feedback that I can guarantee we need to address most of it and make an rc3.

I have some personal health issues right now preventing me from deeply focusing on coding. I welcome all help with addressing these concerns via PRs.

Config fields are private

Agree we need to address this. The main branch only did a builder, there config fields themselves are pub(crate). For semver's sake, I think we should add accessor functions on Config for the properties, not expose them directly.

There would be benefits to not boxing the connectors

I like where this is going. It never felt great to box everything up the way it is now. I'd also been thinking one could explore a completely generics-based chain that is only abstracted/boxed on the top level, but that might be a bit of a hairier change.

The upstream connectors and transports are not public

Yeah, ureq 2.x suffered badly from overexposing stuff that just had to break semver (like re-exporting other crates). Thus I been very reluctant to make all of the internal workings public, thinking that if someone needs the TcpConnector for their local needs, they can just copy-paste the code.

In rc1 and still in rc2, I allowed myself exceptions to semver. Maybe it's a terrible idea, but we could potentially "carve out" exceptions to API stability by clearly marking it as such. Maybe the entire transport module can be considered unstable at this point even if we proceed making the rest of the 3.0.0 release?

The typestate variants are not public

When I started building 3.x, my entry point was that "The http-crate is the public API". RequestBuilder is therefore in my mind an enhancement to the http-crate request::Builder - it even derefs to that. At one point I was even considering making these enhancements as extension traits.

It might be that I need to shift my thinking here. I would be open to a PR exporting these typestate variants under #[doc(hidden)] since I don't want them cluttering the API at the same time as not disallowing the use case you mention.

If you want I can create dedicated issues (or even PRs) for the parts you think are worth it.

Again, thank you so much for taking the time. I will not be able to look into these things myself right now, but I do promise to get to them eventually. I can however review PRs, so I welcome any help you can spare!

sosthene-nitrokey commented 4 days ago

Yeah, ureq 2.x suffered badly from overexposing stuff that just had to break semver (like re-exporting other crates). Thus I been very reluctant to make all of the internal workings public, thinking that if someone needs the TcpConnector for their local needs, they can just copy-paste the code.

I fully understand that concern, we are dealing with similar issues in our own libraries, which is why I suggested having a sibling crate (ureq-utils, ureq-io?) that would contain these implementation, but would not be re-exported by the main ureq crate (only using it without exposing it through the default connector parameters) have more frequent breaking change updates. The issue would be that the Connector trait would also have to be in its own crate that is as stable as ureq to avoid circular dependencies.

In rc1 and still in rc2, I allowed myself exceptions to semver.

I don't know if I would call this an exception to semver. Saying that a specific usage is a bug and its behaviour can change is something that is widely understood as being totally compatible with semver (as long as the types don't change and it's properly documented).

Maybe the entire transport module can be considered unstable at this point even if we proceed making the rest of the 3.0.0 release?

I don't like this idea. Splitting it out in its own crate would be make more sense.

Worst case, it could also be possible (even with the current design), to just copy-paste the connectors in another crate, which could even be done by a third party that needs it (we would probably go this way).

When I started building 3.x, my entry point was that "The http-crate is the public API". RequestBuilder is therefore in my mind an enhancement to the http-crate request::Builder - it even derefs to that. At one point I was even considering making these enhancements as extension traits.

There is a bit of a semver hazard in using a Deref implementation like that no? It could be very possible that the http crate end up adding a query or query_pairs method, which could lead to some confusing error messages no? I also notice that that query does not actually modify the request builder it defers to, this might be confusing to some users. But overall I agree that converging on http as the API is a good idea.

Thank for being open to the feedback!

algesten commented 4 days ago

I don't like this idea. Splitting it out in its own crate would be make more sense.

I just recalled that rustls used to have a similar idea, so there's at least some precedence, granted they are still on semver 0.x.

My trouble with another crate is that I've been on the war path against increasing the dep count. In my view, maybe the main differentiating feature of ureq vs reqwest/static, is dep count.

To reduce dep count, I've been considering rolling hoot back into ureq. Maybe a way forward is to rename hoot -> ureq-proto and roll in the Transport definitions into that crate?

There is a bit of a semver hazard in using a Deref implementation like that no? It could be very possible that the http crate end up adding a query or query_pairs method, which could lead to some confusing error messages no?

True. I hadn't thought about the risk of them hijacking one of our names. If we do not Deref, we need to copy in like 7 fns we currently get via the Deref:

If the http crate adds a way to mutate the query parameters, we probably still need to bump major, since I prefer to delegate to their way of doing it instead of maintaining parallel ways.

I also notice that that query does not actually modify the request builder it defers to, this might be confusing to some users.

This is because I can't rebuild the query part of the uri without incurring the cost of allocation - something we try to avoid. Not ideal though.

sosthene-nitrokey commented 1 day ago

I do not understand why there is such a widespread concern about dep count in the Rust ecosystem. IHMO the real benchmark is:

Separating a single crate into multiple sub-crates does not change any of these metrics (especially if the crates live in the same git repo, even if they have different release cadence). And regarding the overall issue, I think it is worse to have multiple implementation of the same thing through copy-pasted code than having one more dependencies that implements a common pattern an is reused multiple times.

In our specific case, the reason we use ureq instead of reqwest is not related to minimizing dependencies, but it's related to the fact that tokio automatically spawns background threads and thus deals poorly with calling `fork.

algesten commented 1 day ago

I do not understand why there is such a widespread concern about dep count in the Rust ecosystem.

I hear you, but I see it slightly differently.

Maybe a way forward is to rename hoot -> ureq-proto and roll in the Transport definitions into that crate?

Is this an OK compromise?

In our specific case, the reason we use ureq instead of reqwest is not related to minimizing dependencies, but it's related to the fact that tokio automatically spawns background threads and thus deals poorly with calling `fork.

Very interesting! Maybe this should be brought out in the docs? It's also very good input, because there been talks about "happy eyeballs" and other things that might require background threads to be spawned.

sosthene-nitrokey commented 1 day ago

Is this an OK compromise?

Yes, that looks good!

Very interesting! Maybe this should be brought out in the docs?

What would that look like? Something like "ureq does not spawn background threads, and if it ever does in the future, it will be done behind a feature flag, so this can be relied upon if default-features is turned off." seems the least intrusive but also giving strong guarantees.

It's also very good input, because there been talks about "happy eyeballs" and other things that might require background threads to be spawned.

Would multithreading be the best implementation? the happy-eyeballs crate does not use threads and uses non-blocking sockets instead. BTW with the transport abstraction this also means it can be implemented by downstreams instead of requiring an implementation in ureq proper! At least for prototyping/testing it's interesting.

algesten commented 5 hours ago

Is this an OK compromise?

Yes, that looks good!

Cool. I'll rename the hoot crate today. ureq-proto. Then we can make a separate PR moving the transport definitions there.

What would that look like? Something like "ureq does not spawn background threads, and if it ever does in the future, it will be done behind a feature flag, so this can be relied upon if default-features is turned off." seems the least intrusive but also giving strong guarantees.

Yeah, that sounds almost perfect. I wouldn't even hide it behind feature flags, but say that the default configuration would never spawn background threads. The configuration options that will cause threads are clearly documented as such. This happens already today, but it's not brought out enough in the doc.

https://github.com/algesten/ureq/blob/main/src/resolver.rs#L147

Would multithreading be the best implementation? the happy-eyeballs crate does not use threads and uses non-blocking sockets instead.

Yes, with async everything is possible.

BTW with the transport abstraction this also means it can be implemented by downstreams instead of requiring an implementation in ureq proper! At least for prototyping/testing it's interesting.

This is an important reason I decided to rewrite to ureq 3. People come with PRs and suggestions that are really helpful, but I don't necessarily want to support them all myself. A ureq-happyballs crate could be maintained by someone else.