BurntSushi / jiff

A date-time library for Rust that encourages you to jump into the pit of success.
The Unlicense
1.64k stars 26 forks source link

add more integration with `std::time::Duration`? #21

Closed BurntSushi closed 3 weeks ago

BurntSushi commented 1 month ago

There are two different use cases to untangle here.

The first use case is something like this:

I use Rust's standard library std::time::Duration type, and I'd like to use it directly to add and subtract elapsed/absolute time to Jiff's datetime types. I would also like to be able to compute the absolute durations between datetime types using APIs like Zoned::since, but with std::time::Duration as the result returned.

The second use case is something like this:

I like Jiff's high level Span data type for representing both calendar and absolute durations, but this necessarily requires a lot of overhead when performing operations. For example, adding a Span to a Timestamp is not just a simple ~single integer addition. The implementation collapses all allowed units (hours and lower) into a single integer nanosecond (an i128), then converts the Timestamp into an i128 as a number of nanoseconds, then adds them and finally converts the integer number of nanoseconds as an i128 back to Timestamp's internal 96-bit integer representation. All of this overhead means that adding a simple count of integer seconds to a Timestamp is not as fast as I expect it to be.

The second use case is perhaps related to #17, but it is somewhat intertwined here, because one (but not the only) obvious way to address that is to satisfy the first use case: we provide alternative APIs that use std::time::Duration (hereafter just called Duration). This works because a Duration is itself just a 96-bit integer. And it always represents absolute time. That is, it is logically an integer number of nanoseconds. Therefore, adding a Duration to something like a Timestamp should be a very fast operation. Certainly faster than the existing Span focused APIs that we have today.

At present, I believe it is correct to say that Duration appears in the public API exactly zero times. This was not an accident, because I was unsure of how to integrate it. I think there are two challenges:

The first is that Duration is unsigned. There is some discussion in the issue tracking the addition of Duration (and SystemTime) to the standard library, and the related RFC has this to say:

Finally, many APIs that take a Duration can only do something useful with positive values. For example, a timeout API would not know how to wait a negative amount of time before timing out. Even discounting the possibility of coding mistakes, the problem of system clock time travel means that programmers often produce negative durations that they did not expect, and APIs that liberally accept negative durations only propagate the error further.

I was on libs (now libs-api) at the time, and I'm not sure if this was a mistake or not. Maybe the world really does need both unsigned and signed durations. The main issue with an unsigned duration type in Jiff is that it isn't nearly as flexible. For example, if you do ts1.until(ts2) and ts1 refers to a point in time after ts2, then Jiff will happily return a negative span. But what happens if you want to return a Duration? What is the result? One choice is to just return the duration between them such that ts1.until(ts2) and ts2.until(ts1) are equivalent. The other is to return an error. I would probably lean towards "return an error"... This doesn't come up as much with things like ts.checked_add(duration) since there is a ts.checked_sub(duration), and because it's up to caller in this case to apply the right operation. It isn't on Jiff to interpret the duration as anything other than unsigned.

The second problem with Duration integration is that it's not clear how to, well, actually integrate it. Does this mean Jiff should grow checked_add_duration and since_duration methods? I think so... And I think this is probably a requirement to satisfying the second use case above, even if we use a different duration type. I do personally find the names with a _duration suffix quite clunky, but perhaps it's fine because these will be considered "niche" APIs IMO.

It would perhaps be possible to satisfy the first use case without satisfying the second while simultaneously not adding a bunch of clunky _duration APIs to all of the datetime types. This could be accomplished by providing APIs to convert between a Span and a Duration. So callers would still need to use Span, but they would at least have a paved path for using Duration.

I think the main alternative point in the design space here is for Jiff to define its own absolute duration type that is signed. We'd still need things like checked_add_duration and since_duration, but the awkwardness with an unsigned duration type would go away. And then once we have our own absolute duration type, we could define conversions to and from std's unsigned Duration type. And we could deal with the peculiarities of signedness in a very focused and deliberate way.

FeldrinH commented 1 month ago

I don't know how much you've already considered this, but for me the biggest overhead with Span is not the performance overhead but the mental overhead. For example having to deal with potential errors and edge cases caused by the fact that Span can contain calendar units and having to keep in mind that some functions treat 90 seconds and 1 minute 30 seconds as different spans.

BurntSushi commented 1 month ago

I don't know how much you've already considered this, but for me the biggest overhead with Span is not the performance overhead but the mental overhead. For example having to deal with potential errors and edge cases caused by the fact that Span can contain calendar units and having to keep in mind that some functions treat 90 seconds and 1 minute 30 seconds as different spans.

Have you tried writing any code with Jiff yet? If so, would it be possible to share that code and your journey in figuring out how to write it?

It's worth noting that you can only get non-uniform units in your spans if you specifically put them there or request them.

having to keep in mind that some functions treat 90 seconds and 1 minute 30 seconds as different spans.

I think it is only the PartialEq and Eq impls that do this, and I think I'm going to have to remove them for this reason. (There is an open issue about it.) Is there another place where they are treated differently that I'm missing?

FeldrinH commented 1 month ago

Have you tried writing any code with Jiff yet? If so, would it be possible to share that code and your journey in figuring out how to write it?

Nothing substantive, just some small experiments. I've mostly been reading API docs and thinking about how this might work for some projects I've previously worked on. Sidenote: My opinions are probably somewhat by the fact that most of the use cases I'm thinking of work with timestamps, so I would rarely want or need to use calendar units in spans.

It's worth noting that you can only get non-uniform units in your spans if you specifically put them there or request them.

True, but since this isn't enforced at the type level I have to manually keep track of how the span was created and if it might have some kind of problematic units (e.g. calendar units when working with timestamps) in it. I suspect this could get quite tricky if the span is created from user supplied data and/or in a different component of the system.

I think it is only the PartialEq and Eq impls that do this, and I think I'm going to have to remove them for this reason. (There is an open issue about it.) Is there another place where they are treated differently that I'm missing?

The field accessors (get_seconds, get_minutes, etc) as well as serialization and formatting functions. Though these seem mostly useful for introspection and moving data around, so offhand I can't think of any cases where they could cause trouble like Eq and PartialEq can.

BurntSushi commented 1 month ago

I suspect this could get quite tricky if the span is created from user supplied data and/or in a different component of the system.

I think it should be okay? Jiff won't panic in these cases, it will return an error. So if you're adding a Span to a Timestamp and the Span came from user input, you'd just use it like normal and bubble the error up to the user. It will give an error message if, for example, the span contains any non-zero year units.

The field accessors (get_seconds, get_minutes, etc) as well as serialization and formatting functions. Though these seem mostly useful for introspection and moving data around, so offhand I can't think of any cases where they could cause trouble like Eq and PartialEq can.

I'm not sure the getters really apply here because they don't give you a Span. They are giving you its components.

And yes, indeed, PT120s and PT2m are two different serializations of the same elapsed duration, and they would come from two distinct Span values in memory that also represent the elapsed duration of time. I don't think this can lead to issues though. And it is permitted by the ISO 8601 standard.

BurntSushi commented 1 month ago

Nothing substantive, just some small experiments. I've mostly been reading API docs and thinking about how this might work for some projects I've previously worked on. Sidenote: My opinions are probably somewhat by the fact that most of the use cases I'm thinking of work with timestamps, so I would rarely want or need to use calendar units in spans.

Aye. Any details you can share about your journey would be helpful. Thanks!

FeldrinH commented 1 month ago

I think it should be okay? Jiff won't panic in these cases, it will return an error. So if you're adding a Span to a Timestamp and the Span came from user input, you'd just use it like normal and bubble the error up to the user. It will give an error message if, for example, the span contains any non-zero year units.

Maybe it will be okay. Hard to say without more practical experience. Depending on what you're building you might not have a user to bubble the error to (e.g. if you're doing some kind of background processing on a server or a daemon service).

BurntSushi commented 1 month ago

Yeah but there should always be an error channel of some kind. Most of Jiff's API fallible, e.g., due to overflow.

FeldrinH commented 1 month ago

Yeah but there should always be an error channel of some kind. Most of Jiff's API fallible, e.g., due to overflow.

Still, it would be good to catch errors early. A span with calendar units is effectively invalid in some contexts (e.g. working with timestamps) and in those cases the ability to enforce at compile time that calendar units aren't present seems like a useful thing that would reduce mental overhead.

BurntSushi commented 1 month ago

I think adding support for std::time::Duration would fix that right? That's what this issue is for. :-)

But I am really looking for more concrete experiences as opposed to design philosophy. Compile time errors are great, but they must be considered in the context of trade offs.

FeldrinH commented 1 month ago

I think the main alternative point in the design space here is for Jiff to define its own absolute duration type that is signed.

I personally favour defining a signed duration type, because of the clunkiness and complexity that an unsigned duration type brings, but I agree that without concrete experience it is hard to properly evaluate the tradeoffs.

dvdsk commented 1 month ago

Adding my own (limited) experience porting my hobby sensor data platform to jiff.

I have a tui app showing a graph of sensor values from history (can be a year can be a minute). That history length is currently a core::time::Duration. The app fetches data from a custom data-store. The API for that looks like this:

fn get_data(&mut self, 
  start: jiff::Timestamp, 
  end: jiff::Timestamp, 
  reading: protocol::Reading, 
  n: usize,
) -> Result<(Vec<jiff::Timestamp>, Vec<f32>), Error>;

Now I would like to use that from a function that gets an core::time::Duration. For that I would ideally call it like this:

api.get_data(
  jiff::Timestamp::now() - length, // length is a core::time::Duration
  jiff::Timestamp::now(),
  reading.clone(),
  300,
)

Alternatively Span::from(length) would work too. I see no way to subtract a core::time::Duration from a Timestamp right now.

So currently I am converting to millis then to Span, not very pretty:

api .get_data(
  jiff::Timestamp::now()
   - jiff::Span::default().milliseconds(length.as_millis() as i64),
  jiff::Timestamp::now(),
  reading.clone(),
  300,
);

Now this could be solved by switching every duration out for a Span. To me there are two disadvantage of doing that:

I hope this was helpful, if there is any way I can assist let me know!

* Off topic, does the cat tax apply to docs?

BurntSushi commented 1 month ago

Yeah I think #21 will help here. And adding a TryFrom<Duration> for Span should be straight-forward and not have any footguns. A TryFrom<Span> for Duration is somewhat more questionable, but justifiable.

Off topic, does the cat tax apply to docs?

cauchy

:-( https://x.com/burntsushi5/status/1708169161459855608

martintmk commented 1 month ago

The second problem with Duration integration is that it's not clear how to, well, actually integrate it. Does this mean Jiff should grow checked_add_duration and since_duration methods? I think so... And I think this is probably a requirement to satisfying the second use case above, even if we use a different duration type. I do personally find the names with a _duration suffix quite clunky, but perhaps it's fine because these will be considered "niche" APIs IMO.

I think having the APIs similar to:

let duration: Duration = timestamp2.since_duration(timestamp1).unwrap("the difference was negative");
let duration: Duration = timestamp2.until_duration(timestamp1).unwrap("the difference was negative");

let duration: (i8, Duration) = timestamp2.since_duration_signed(timestamp1);
let duration: (i8, Duration) = timestamp2.until_duration_signed(timestamp1);

let duration = Duration::from_secs(1);

let timestamp: Timestamp = timestamp.checked_add_duration(duration).unwrap("overflow");
let timestamp: Timestamp = timestamp.saturating_add_duration(duration);

let timestamp: Timestamp = timestamp.checked_sub_duration(duration).unwrap("overflow");
let timestamp: Timestamp = timestamp.saturating_sub_duration(duration);

Should cover majority of use-cases where simpler "Span" might be enough. For services/libraries working with UTC timestamps, above is really what's needed and we don't need to deal with complexities of relative units in the Span.

Difference between two timestamps. Usually, negative difference means that something is expired, and do no-op. For cases where we are interested in negative values, one can just use since_duration_signed or until_duration_signed.

BurntSushi commented 1 month ago

above is really what's needed and we don't need to deal with complexities of relative units in the Span.

Can you share some code that is more complex than you would like when dealing with Timestamp and Span? There shouldn't be any need to deal with relative units there.

My main gripe is that I'm not a huge fan of vomiting a new group of four methods for every single possible duration type supported.

martintmk commented 1 month ago

Can you share some code that is more complex than you would like when dealing with Timestamp and Span? There shouldn't be any need to deal with relative units there.

If we omit the relative spans, I think current API is clean and understandable. It's just that we use std::time::Duration on lot of places and would be nice to use it directly without the overhead and roundtripping through Span.

My main gripe is that I'm not a huge fan of vomiting a new group of four methods for every single possible duration type supported.

Do you foresee any other types for which this would have to be done?

BurntSushi commented 1 month ago

If we omit the relative spans, I think current API is clean and understandable. It's just that we use std::time::Duration on lot of places and would be nice to use it directly without the overhead and roundtripping through Span.

Right, okay, that's what I'm trying to understand. I laid out what I believe are two buckets of user stories: the first is "I already use Duration everywhere and I want to keep using it for ergonomic reasons." The second is, "Span has too much overhead and my performance requirements demand something cheaper." The former specifically calls for integration with std::time::Duration, but the latter actually does not! So it's important to understand where you are here. To be clear, you could be in both camps.

It's not the "relative" aspect of Span that is tripping you up here, since a Timestamp only deals with uniform units on Span. So you should never have to deal with any relative aspects of it. Even if a Span only contains uniform units, it would likely be a problem for you.

Do you foresee any other types for which this would have to be done?

Yes. I mentioned Jiff defining its own signed duration type in the OP. A (i8, std::time::Duration) is pretty unwieldy. We're in a tough spot here because we want integration with std, but std doesn't have a signed duration. So we could make our own signed duration, add one group of additional methods for the signed duration, and then add conversions to and from std::time::Duration on the signed duration type.

BurntSushi commented 1 month ago

OK, I've taken my first crack at integration with std::time::Duration by introducing conversion routines between it and Span. I suspect we still need to do more (like on the datetime arithmetic APIs), but there are thornier questions there. This, by comparison, was pretty straight-forward I think. The new APIs are:

These routines aren't difficult for callers to implement on their own, but they're subtle enough with enough failure scenarios that it's definitely worth it for Jiff to provide them for you. And I designed them in such a way that I believe they are impossible to misuse.

martintmk commented 1 month ago

Right, okay, that's what I'm trying to understand. I laid out what I believe are two buckets of user stories: the first is "I already use Duration everywhere and I want to keep using it for ergonomic reasons." The second is, "Span has too much overhead and my performance requirements demand something cheaper." The former specifically calls for integration with std::time::Duration, but the latter actually does not! So it's important to understand where you are here. To be clear, you could be in both camps.

I think we are in both camps, with the first one being the bigger concern :)

My main gripe is that I'm not a huge fan of vomiting a new group of four methods for every single possible duration type supported.

I looked at the implementation of how Spans are added to Timestamp: https://github.com/BurntSushi/jiff/blob/3c3b810dbd82cc37fb75e895d91ad94d7c136240/src/timestamp.rs#L1271

Basically, we could see it as anything that can produce seconds/nanoseconds pair could be added to Timestamp. Can the signature of checked_add (and saturating_add, saturating_sub, checked_sub) be changed to something akin to:

pub fn checked_add(self, span: impl Into<TimestampSpan>) -> Result<Timestamp, Error>

Where TimestampSpan is something similar to:

pub struct TimestampSpan(i64, i32);

Then both Span and Duration can implement From for TimestampSpan:

pub struct TimestampSpan(i64, i32);

impl From<Duration> for TimestampSpan {
    fn from(duration: Duration) -> Self {
        let seconds = duration.as_secs();
        let nanos = duration.subsec_nanos();
        TimestampSpan(seconds as i64, nanos as i32)
    }
}

Now one can use the same set of methods for adding/subtracting both Span and Duration with Timestamp. Alternative is to just use Into<Span>, however that would remove the possibilities of some optimalizations in the future, since Span might be too heavy for these operations on the Timestamp.

Wdyt?

BurntSushi commented 1 month ago

@martintmk Yes, I've been thinking about that. (Although I would probably use std::time::Duration for it?) The problem is that doesn't work for the since/until methods, which return a Span.

martintmk commented 1 month ago

Although I would probably use std::time::Duration for it?

Wouldn't std::time::Duration be constraining since it does not support negative values?

The problem is that doesn't work for the since/until methods, which return a Span.

I checked the usage and it does support rounding directly when calling since/until. Can this be simplified and anyone who wants to round can just do it on the resulting Span?

BurntSushi commented 1 month ago

I talked about signed durations further up. This is part of the problem of the API design here.

I checked the usage and it does support rounding directly when calling since/until. Can this be simplified and anyone who wants to round can just do it on the resulting Span?

I don't see how that would help here? Otherwise, see: https://docs.rs/jiff/latest/jiff/_documentation/design/index.html#what-are-the-zonedifference-and-zonedround-types-for

BurntSushi commented 1 month ago

Here is what I'm currently leaning towards:

I think this will cover everything. It provides nearly maximal integration with std and also provides a way to do "just add nanoseconds please" without going through the chunkier Span type that should help with perf.

So... What do we call the signed duration type? I suppose the obvious choice is SignedDuration, but it's kind of a mouthful in my opinion. But maybe that's the best name since it's the least likely to lead to confusion. Another name I kind of like is Elapsed, but it doesn't really help folks understand why it exists, its relationship with std::time::Duration or how it's different from Span. Plus, my hope is that most folks will use a Span most of the time.

martintmk commented 1 month ago

Adding a new signed duration type that represents absolute time. Basically, an exact copy of std::time::Duration, but signed instead of unsigned.

Personally, I love this idea and it's exactly what's was missing to me. Span is powerful, but also comes with lot's of gotchas.

Adding support for this new type, and std::time::Duration, to the existing add/subtract APIs on all datetime types. Today, these methods accept a concrete Span. But I think I can make them polymorphic such that they accept any of Span, std::time::Duration or the new signed duration type I mentioned above. This will mean adding a new public target type for each datetime type like ZonedArithmetic. But we already have a bunch of those, so I don't think it's a big deal.

Perfect!

I think this will cover everything. It provides nearly maximal integration with std and also provides a way to do "just add nanoseconds please" without going through the chunkier Span type that should help with perf.

Should also cover a lot in #17

I suppose the obvious choice is SignedDuration, but it's kind of a mouthful in my opinion. But maybe that's the best name since it's the least likely to lead to confusion.

I like SignedDuration, it explains its purpose pretty well. It's just that, a signed duration. The Elapsed name gives me the vibe of it only being positive, on glance to seems the same thing as Duration.

BurntSushi commented 1 month ago

Yeah, I think I am likely to go with SignedDuration. If folks have any better names, please chime in.

I'll probably also change the Timestamp APIs too. So Timestamp::from_duration will accept a SignedDuration and Timestamp::as_duration will return a SignedDuration. And then drop Timestamp::from_signed_duration. And I'll make Span::to_duration return a SignedDuration. I'll leave the TryFrom impls between std::time::Duration and Span, but add corresponding impls for SignedDuration as well.

So we'll add direct integration with std::time::Duration where feasible, but otherwise insist on the caller using conversion routines between std::time::Duration and SignedDuration.

I think everything above should work. But it will a be a breaking change. I might try to get a slightly tweaked form into a semver compatible release with some deprecations.

but also comes with lot's of gotchas.

I hopefully think the only gotchas are perf related. Otherwise, it should be very hard to misuse.