BurntSushi / jiff

A date-time library for Rust that encourages you to jump into the pit of success.
The Unlicense
1.73k stars 31 forks source link

`intz` name is confusing #28

Open vultix opened 3 months ago

vultix commented 3 months ago

First off, thank you for making this! I'm hopeful this can lead to some consolidation of the datetime libraries in rust.

Reading through the documentation, I was consistently reading .intz as .int_z. I assumed this naming referred to some standard I was unfamiliar with. It took a bit for it to click that the correct reading is .in_tz

I like how terse the current name is, but feel it worthwhile to potentially introduce an extra underscore to make the meaning a bit more clear.

vultix commented 3 months ago

Some options I can thing of:

All of these names seem quite ambiguous with the to_zoned function, requiring reading documentation to understand the difference. I think my preferred api would be to consolidate both functions into a single method:

pub fn zoned(self, tz: impl IntoTimeZone) -> Result<Zoned, Error>
BurntSushi commented 3 months ago

I would love .in(...) personally, but in is a keyword, so that's a no-go.

I don't mind tz(...). That is my favorite among the ideas you've suggested.

I did indeed consider collapsing intz() and to_zoned() down into one polymorphic routine. I believe the problem I ran into there is that Timestamp::to_zoned is infallible, and if it were polymorphic and could accept a string, then it would need to be fallible. Which I didn't want to do. I felt like instead, establishing an API pattern that intz takes a string and to_zoned takes a TimeZone would be a better way to go. But I'm not 100% set on that. Maybe there is another way to make a single method work.

rben01 commented 2 months ago

I am not a fan of tz because it sounds to me like it should function similarly to Zoned::time_zone, i.e., it should be a getter. I do prefer in_tz to intz for the same reason as OP (my first thought was a shorthand for “internationalize”?).

However given the fact that method names are shared throughout the crate, I think the following inconsistencies arise:

I propose the following (Result being short for Result<Zoned, Error>):

  1. Timestamp::with_tz(&str) -> Result, because we are attaching a time zone to something without one, without changing the moment in time it corresponds to
  2. Date[Time]::in_tz(&str) -> Result, because a DateTime only corresponds to an instant in time when it's in a tz. Grammatically, this is analogous to Date::at, as “in” and “at” both refer to an object’s (physical or temporal) location
  3. Zoned::to_tz, because we are changing the tz of something that already has one
  4. Timestamp::try_with_zone(TimeZone) -> Result
  5. Date[Time]::in_zone(TimeZone) -> Zoned
  6. Remove TimeZone::to_zoned unless it can be generic over Date and DateTime.

On the one hand this is a lot of function names to remember, but on the other hand they “make sense” and don't suggest similarities that aren't really there. Also I am not sure about how easy it is to remember that it's _tz(&str) but _zone(TimeZone) or how that could be reinforced (or changed to a better convention).

Alternatively, regarding this point:

if it were polymorphic and could accept a string, then it would need to be fallible

perhaps std::convert::Infallible or ! could be used as the error type of the Result so that at least the infallibility was part of the type and, eventually, could be safely unwrapped with Result::into_ok.

BurntSushi commented 2 months ago

I don't think there's much value in using these names to capture very subtle differences between the operations. If there were multiple distinct "with time zone" operations for each type, then that might make sense, but I feel pretty strongly that "attach a time zone to this date/datetime/timestamp" should all have the same name. I used intz for a time zone name and to_zoned for a TimeZone. The Zoned type does have slightly different names because it is itself Zoned.

Timestamp::try_with_zone(TimeZone) -> Result

Attaching a jiff::tz::TimeZone to a jiff::Timestamp is always infallible.

Remove TimeZone::to_zoned unless it can be generic over Date and DateTime.

I suppose it could accept a Into<DateTime>, but I don't see this as a particularly important point IMO. I'm not sure why it would be removed if it couldn't be generic though?

perhaps std::convert::Infallible or ! could be used as the error type of the Result so that at least the infallibility was part of the type and, eventually, could be safely unwrapped with Result::into_ok.

I can see how that's sometimes necessary, but I find that very yucky personally.

nicoburns commented 2 months ago

One thing I have struggled with in other datetime libraries is differentiating between:

  1. A function that takes an unzoned datetime and interprets as being in the specified zone.
  2. A function that takes a zoned datetime and converts it to being in another time zone (applying an offset)

It looks like jiff also makes this tricky to differentiate by having a method named intz do (1) when called on a civil::DateTime and (2) when called on a Zoned or Timestamp. This is somewhat ameliorated by Timestamp and civil::DateTime being separate types, but I would still prefer different methods names if possible.

In the other library where I have had this issue (moment.js) I resorted to evaluating code in a repl to ensure that I had the correct behaviour. I think in this case I would able to determine correct behaviour from the types, but that would still make reading code where the types are not present (common in Rust!) difficult. I can particularly imagine this being an issue in code review where IDE hints are often not easily available.

I wonder if a better API for this might actually be a constructor on Zoned like Zoned::new but taking a civil::DateTime rather than a Timestamp.


Aside: I believe there is a typo in the [DateTime::to_zoned](https://docs.rs/jiff/latest/jiff/civil/struct.DateTime.html#method.to_zoned:

In the common case of a time zone being represented as a name string, like Australia/Tasmania, consider using DateTime::to_zoned instead.

^ I think this should point to DateTime::intz instead.

BurntSushi commented 2 months ago

@nicoburns Do you think different names can really capture the full subtlety of the differences in those operations? The names suggested above all just seem like different variations of the same general concept to me: to_tz, in_zone, with_tz and so on give me a sense of arbitrariness with the name selection. I don't see them helping folks to understand the nuanced difference from the name itself. I can see how they might be post hoc rationalized in a way, but I think it's a stretch.

The intent really is that it's the types that help you figure out what to do. Maybe some prose docs would help. In particular, the way I conceptualize time zones is that they provide functions to go back-and-forth between exact time and inexact time in a particular geographic region. (Where "inexact time" is synonymous with "local," "civil," "wall," "clock," "naive," "plain" and so on.) civil::DateTime -> Timestamp and civil::DateTime -> Zoned are going from inexact to exact, while Zoned -> civil::DateTime is going from exact to inexact. And crucially, the the former might have ambiguity (so it's really civil::DateTime -> zero or more of Timestamp) while the latter is never unambiguous.

The other operation you mention, which is changing the time zone of exact time, is better conceptualized as, "changing the function that takes exact time to inexact time from one geographic region to another."

In all of these operations, the crucial parts are 1) what the operands are, 2) the input operand and 3) the output operand. Once you have those puzzle pieces in play, that tells you the rest. So I guess in theory, we could name "exact -> inexact" differently from "inexact -> exact." But I worry that will be an incredibly subtle difference that requires already knowing the right conceptual model first to understand why the names are different. This is why I'm more in favor of using the same name everywhere: the name reflects the fact that a time zone is being used to implement one of the following transformations: inexact ->{tz} exact, exact ->{tz} inexact or exact{tz1} -> exact{tz2}.

To be clear, I'll list the methods for each of the above transformations on the main datetime types:

I think in this case I would able to determine correct behaviour from the types, but that would still make reading code where the types are not present (common in Rust!) difficult.

If you're reading code and you see abc.intz("America/New_York")?, then abc could be one of four different types: Zoned, Timestamp, civil::DateTime or civil::Date. If each of those types had a different name for intz and you knew off the top of your head the mapping between those names to their corresponding types, you would be able to determine the type and thus what kind of transformation was taking place. I am kinda skeptical that this would be easy for someone to do with passing familiarity with Jiff, but putting that aside, what would you do with this information at this point in code review? Is there a mistake you'd look for? One thing I can think of is that perhaps you care about the disambiguation strategy, and that only applies for inexact ->{tz} exact transformations. But this is pretty rare I think. I'm struggling to think of other reasons why you would want to know which operation was being performed without being able to know the types.


For awareness on my current thinking, I am very on board with changing intz to something else. in_tz seems to be popular, although I still don't like it personally.

vultix commented 2 months ago

Do you think different names can really capture the full subtlety of the differences in those operations?

I definitely feel this point. There have been many libraries I've used where the names for these operations are ambiguous at best.

The intent really is that it's the types that help you figure out what to do.

I like this in principle, but rust type inference can make it difficult to know what the types are when reading code, which can be especially pernicious in code review. Ideally we could have our cake and eat it too. Short, descriptive, unambiguous, intuitive names for each operation, with type information cementing the intention of the names.

In the interest of exploring the design space, I'll enumerate some options for each of the four operations I care about. I'll include extremely verbose names and extremely terse names.

1. Dropping the timezone of a Zoned, to get a civil DateTime

Example: 2023-05-13 11:30 UTC -> 2023-05-13 11:30

Possible names:

2. Specifying a timezone for a civil DateTime

Example: 2023-05-13 11:30 -> 2023-05-13 11:30 UTC

I also feel strongly that the same name be used for attaching a timezone to a date/datetime/timestamp.

Possible names:

3. Converting a Zoned to another timezone, keeping same timestamp

Example: 2023-05-13 11:30 UTC -> 2023-05-13 06:30 EST

Possible names:

4. Replacing the timezone of a Zoned, keeping same local time

Example: 2023-05-13 11:30 UTC -> 2023-05-13 11:30 EST

Possible names:


I think these are my favorites:

  1. Zoned::datetime or Zoned::to_civil These are both simple and intuitive
  2. DateTime::tz & Date::tz & Timestamp::tz This is very common, and associating a timezone isn't especially ambiguous, so a short name is ideal. This plays especially well with the date().time().tz() builder pattern.
  3. Zoned::with_tz_same_timestamp I don't think this operation is as common, but when you need it it's very important you choose the correct operation.
    A more verbose but descriptive name seems appropriate here.
  4. Zoned::with_tz_same_local Similar to 3, this is a less common operation, and it's important to get right when you do need it.

Sorry for the huge writeup! I don't think there's any one right answer here, but this feels like the best set of tradeoffs to me. Thoughts?

BurntSushi commented 2 months ago

Thank you for that write-up! I am somewhat a fan of DateTime::tz. It in theory could be confused for a getter, although a getter for a time zone on those types doesn't make sense. So it really can't be. But still, I wonder if it could lead to confusion. But I do like it. (I really wish we could use in here. But it's a keyword. Another possibility is of, but I'm not sure that works well.)

As for Zoned::with_tz_same_local, the first thing there is it would need to be civil, not local. (Or if a different name is chosen in #12, then that name.) The second thing is that I very intentionally left this operation off of Zoned. It mimics Temporal's API, and I think needing to go through civil time explicitly and then back to Zoned makes it especially clear, on its own, that you're asking for a different time zone while trying to keep the local time the same. I expect that's why Temporal's API is designed that way as well.

As for Zoned::with_tz_same_timestamp, I'm not a huge fan personally. I think that if we have Zoned::with_tz_same_local, then yes, we need a longer name for Zoned::intz as a useful parallel. A Zoned is really just an instant (in a specific geographic region). It is only also a civil datetime because literally every instant in any time zone has a corresponding and unambiguous civil datetime. So maybe we need to emphasize this more in the docs.

Also keep in mind that there is both a Zoned::intz (the convenient one that takes an IANA string ID) and a Zoned::with_time_zone (that takes a jiff::tz::TimeZone).

vultix commented 2 months ago

I'd also considered that DateTime::tz could be thought of as a getter, but as you said, that really doesn't make any sense for any of those types, so I don't think it'd be a problem in practice.

If we decide not to have both Zoned::with_tz_same_civil and Zoned::with_tz_same_timestamp, and instead only have the latter option, I think I'd still want a more verbose name. I find it highly likely that a developer will accidentally reach for Zoned::intz intending to perform the former operation. There's nothing in the type system itself protecting us here, and this can be easily missed in a code review.

thumbert commented 2 months ago

If you are going to move away from intz, I favor with_tz. I don't really buy the argument that people read it as int_z when the both the previous tokens and the following token in the line indicates clearly that we're dealing with date times.

BurntSushi commented 2 months ago

I don't really buy the argument that people read it as int_z when the both the previous tokens and the following token in the line indicates clearly that we're dealing with date times.

I think it's more just an eye-reaction response, since folks are very very used to int being a thing, and maybe less used to tz being a thing. So the brain has trouble parsing it in the way it's intended. My guess is that if someone thinks about it, they'll get it, though.