Open BurntSushi opened 3 months ago
Loving this! Nushell does this type of thing and just calls them durations in the repl.
I've also written something kind of like what nushell does as an output for jiff of a date diff command that I'm writing for nushell.
❯ '2019-05-10T09:59:12-07:00[-07:00]' | dt diff (dt now)
P5y3m12dT26m1.5374272s
5yrs 3mths 1wks 5days 26mins 1secs 537ms 427µs 200ns
The first output line is just for debugging.
Personally, I'd like to see jiff have more synonyms for datetime. Not saying this is perfect but this is where I've landed so far. (maybe I should use some upper case letters too to differentiate things like months and minutes)
pub fn get_unit_from_unit_string(unit_name: String) -> Result<Unit, LabeledError> {
let unit = match unit_name.as_ref() {
"year" | "years" | "yyyy" | "yy" | "yr" | "yrs" => Ok(Unit::Year),
"month" | "months" | "mth" | "mths" | "mm" | "m" | "mon" => Ok(Unit::Month),
"day" | "days" | "dd" | "d" => Ok(Unit::Day),
"week" | "weeks" | "ww" | "wk" | "wks" | "iso_week" | "isowk" | "isoww" => Ok(Unit::Week),
"hour" | "hours" | "hh" | "hr" | "hrs" => Ok(Unit::Hour),
"minute" | "minutes" | "mi" | "n" | "min" | "mins" => Ok(Unit::Minute),
"second" | "seconds" | "ss" | "s" | "sec" | "secs" => Ok(Unit::Second),
"millisecond" | "ms" | "millis" => Ok(Unit::Millisecond),
"microsecond" | "mcs" | "us" | "micros" => Ok(Unit::Microsecond),
"nanosecond" | "ns" | "nano" | "nanos" => Ok(Unit::Nanosecond),
_ => {
return Err(LabeledError::new(
"please supply a valid unit name to extract from a date/datetime. see dt part --list for list of abbreviations.",
))
}
};
unit
}
Also note that I'm forcing weeks here, but some may find it odd, as we've discussed prior. For my code here, I'm dictating a standard abbreviation, but when parsing (above) I try to be more forgiving and allow anything that could reasonably be considered as an abbreviation.
fn create_nushelly_duration_string(span: jiff::Span) -> String {
let mut span_vec = vec![];
if span.get_years() > 0 {
span_vec.push(format!("{}yrs", span.get_years()));
}
if span.get_months() > 0 {
span_vec.push(format!("{}mths", span.get_months()));
}
// if we have more than 6 days, show weeks
let days_span = span.get_days();
if days_span > 6 {
let weeks = span.get_weeks();
if weeks == 0 {
let (weeks, days) = (days_span / 7, days_span % 7);
span_vec.push(format!("{}wks", weeks));
if days > 0 {
span_vec.push(format!("{}days", days));
}
} else if span.get_days() > 0 {
span_vec.push(format!("{}days", span.get_days()));
}
} else if span.get_days() > 0 {
span_vec.push(format!("{}days", span.get_days()));
}
if span.get_hours() > 0 {
span_vec.push(format!("{}hrs", span.get_hours()));
}
if span.get_minutes() > 0 {
span_vec.push(format!("{}mins", span.get_minutes()));
}
if span.get_seconds() > 0 {
span_vec.push(format!("{}secs", span.get_seconds()));
}
if span.get_milliseconds() > 0 {
span_vec.push(format!("{}ms", span.get_milliseconds()));
}
if span.get_microseconds() > 0 {
span_vec.push(format!("{}µs", span.get_microseconds()));
}
if span.get_nanoseconds() > 0 {
span_vec.push(format!("{}ns", span.get_nanoseconds()));
}
span_vec.join(" ").trim().to_string()
}
We "borrowed" liberally from chrono-humanize-rs as inspiration and kind of rolled our own but we also support a date humanize
command.
❯ '2019-05-10T09:59:12-07:00' | date humanize
5 years ago
❯ (date now) - 2019-05-10T09:59:12-07:00
275wk 5day 22hr 34min 20sec 879ms 101µs
So, any duration automatically is expressed in a nushell-humanized nomenclature. I'm not satisfied with it, or our other datetime handling, which is why I'm here in this repo. 😄
I think friendly
or human
, humantime
, humanize
all express viable naming options.
@fdncred Thanks for the feedback! I'm not sure about some of those unit designators. Something like 5 yyyy
reads very weird to me hah. But there are some in there that probably make sense to add, like wks
.
Out of curiosity, how does nushell deal with locale? That is honestly my biggest hesitation with something like this. ISO 8601's duration format and Temporal's ISO 8601 datetime format are interchange formats. They are "human readable," but their primary purpose is in the exchange of data in an agreed upon format between different systems. Otherwise, Temporal punts internationalization to another TC39 group. This is also why, AIUI, Temporal doesn't support something like strptime
. And AIUI, strftime
is insufficient for correct internationalization. But I added strptime
and strftime
because they are just so incredibly useful. That's also why I want to add this new "friendly" format as well, because in practice, folks just want an easy way of accepting and printing durations that humans can easily interact with. Assuming you're find assuming English and a Western/Gregorian understanding of date-keeping.
On the flip side, I don't necessarily want to tie one hand behind my back and resist offering useful APIs for English speaking folks using the Gregorian calendar just because they aren't universally applicable. And on the other flip side, I don't want to go down the enormous rabbit hole of internationalization either. It's just not a problem I want to spend the next ~months/years of my life working on. And still yet, I don't want to do a "half baked" solution where you can just configure the strings used for "year" or whatever.
One thing I tend to fall back on here is that ISO 8601 itself encodes the notion of a Gregorian calendar and Western time keeping. So by doubling down on English unit names, we aren't really do anything more than what ISO 8601 already assumes. But this "friendy" format is stretching it a little bit by allowing spelled out words like "year" instead of just the Y
designator.
Something like 5 yyyy reads very weird to me hah
lolz! I mainly like just covering the sql abbreviations, singular and plurals, and then some really abbreviated variations
Out of curiosity, how does nushell deal with locale?
As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.
internationalization
Big ugh! Lots of work there. For nushell, rightly or wrongly, we've standardized on English. We have contributors across the world, and no one really complains about it. We do have some strange issues sometimes from Chinese folks, from time to time, but that's usually due to bytes/chars/grapheme stuff.
I think your ISO 8601 argument is valid and I personally see "friendly" easily falling into that without issue or complaint.
I have two pieces of unsolicited friendly advice to you on your "half baked" comments. (not that your comments are half baked but the comments where you mention half-baked, lol) They have kind of become my mantra with nushell.
As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.
Oh, sorry, I meant locale as in internationalization. In Temporal's case, there is nothing in its API that lets you print or parse strings like July
or Saturday
. That's where Jiff diverges from Temporal, although only a very small corner of Jiff does this. The vast majority of Jiff is "independent" of internationalization problems and this was very much intentional. But the strtime
APIs are an example of where we start heading into areas that internationalization might have a role to play. And this "friendly" format doubles down and extends it more.
And oh yeah, I've been uttering "don't let perfection be the enemy of the good" for a long time now. I'm confident I won't fall into that trap. But I also want to be sensitive to bias and implicit power structures. If I'm building a datetime library and that datetime library gets popular, but it specifically encourages English in places and eschews all other languages and calendar systems, then how big of a problem is that? Anyway, I don't mean to say that this must prevent me from moving forward with formats like this, but that it's something that's on my mind that I weigh as a trade-off.
I'd love advice from an internationalization expert to be honest. In particular about the trade-offs involved. Like, Jiff won't be the first library to offer a "human friendly" duration format like this. Others have done it before in different ways. How big of an issue has it been if it doesn't support internationalization? And if we did want to support internationalization, is it sufficient to "just" provide a way to override the unit designators with different strings? (I think it isn't, but I don't really know for sure.) Or is that something that helps a lot and doesn't hurt? Or is it a half-baked thing that actually ends up hurting more than it helps?
Oh, sorry, I meant locale as in internationalization.
Oops. My fault. I have a fixation on datetime atm. With nushell, I think the only thing we do with localization is determine what type of separator to use for thousands and decimals. Everything else is just English, iirc. If crates we use support strftime things with localization we don't interrupt it.
How hard is it? I'm no expert, but just figuring out the locale and changing the minimal things we did for decimals and thousands separators was a pain. Edge cases just drive me crazy sometimes. You think you have it right, but some funky thing breaks your software. Ugh!
How hard is it?
That's part of the problem. My understanding is that full and correct support (in accordance with relevant Unicode specs) is an enormous under-taking. And so it's best left to crates like icu
to do it. But there are in theory things I could do that aren't as complicated, like allow the caller to substitute in their own unit designator strings. But I don't know if that's helpful or actively harmful.
I was just going to ask about humantime
support but I see you are already on it :) Thanks.
PS: love the rount(Unit)
BTW.
Andrew, you may wish to take inspiration from hifitime here because our parser of durations is no-std (using the lexical crate), and supports rounding based on the unit: https://github.com/nyx-space/hifitime/blob/master/src/duration/parse.rs#L15 .
Note that I'm currently working to support Durations down to the zeptosecond (1e-21) for hifitime, so this functionality is bound to be expanded upon.
@ChristopherRabotin That appears to be just about a subset of the grammar I wrote above. I see only a few main differences making it not a proper subset:
:
, your format interprets an integer, I think, as a number of seconds. I intentionally decided to not support that in the grammar above due to the ambiguity it can cause. The [-+]H:M:S(.[0-9]+)?
is supported in the grammar above though.1.5 hours 30.5 minutes
isn't allowed by the grammar above.Otherwise, my grammar above also supports units of years, months and weeks.
The closest analog to hifitime's Duration
type is probably Jiff's SignedDuration
, and in that context, Jiff specifically forbids the concept of "days" because days are non-uniform units in the context of time zones. To express "days," Jiff requires you to use a Span
, which is more like a bag of units instead of one single absolute duration.
Ref - https://github.com/piperRyan/rust-postgres-interval/
I wonder if other parse / format options are in the scope of this library. That we can parse from string literal in other format like PostgreSQL's, and print in that format.
Well from that README, that format at least is supported by the grammar I posted above.
If one needs a parser specific to something like PostgreSQL, I feel like that's niche enough that you should write your own or publish a crate dedicated to it.
humantime has format_duration
also.
The SignedDuration
default print format is ISO-8601, which is a bit hard to read for most of users.
Somehow I can convert SignedDuration to std Duration and then call humantime's format_duration. But since this issue discusses about support a new duration format inspired by humantime, I wonder if we add both parse and format, or only the parse part. And if both, do we have existing method to invoke the related logic now?
UPDATE - I noticed that the debug format of SignedDuration seems "human readable", while the Display format isn't (looks like reverse?).
println!("{}", jiff::SignedDuration::from_str("PT1m").unwrap()); // PT1m
println!("{:?}", jiff::SignedDuration::from_str("PT1m").unwrap()); // 60s
Well. I may understand that debug is for readonly, while the display impl may follow a invariant parse(format(v)) == v
, and we don't support parse 60s
things now.
@tisonkun You marked your comment as resolved, but to elaborate...
Yes, Display
and FromStr
are indeed intended to be duals of one another for both SignedDuration
and Span
. My intent is that those will continue to use ISO 8601 durations (technically, Temporal ISO 8601 extension) for the purposes of interoperability. While I agree it is not easy to read, it is not just a de facto standard but the standard interchange format for durations. That elevates it to a point where it really ought to be the default I think.
The "friendly" format outlined here is just something I made up. Well, I mean, I didn't invent the idea from whole cloth. But I did my best to formalize a grammar that should parse most English human readable durations, including those emitted by humantime
. And yes, it is intended for jiff::fmt::friendly
to have both a printer and a parser. In fact, I'm just about done implementing the printer.
My loose plan at this point is:
jiff::fmt::friendly
module that largely follows the structure of other jiff::fmt
modules. That is, a bit more verbose to use, but it's the lower level API that provides the most configuration knobs and flexibility.Debug
impls for SignedDuration
and Span
use jiff::fmt::friendly
. Indeed, making Span
's Debug
impl not use ISO 8601 is an explicit goal in order to resolve #60. The status quo is that multiple distinct Span
values in memory can have the same Debug
output, and this is really bad for programmer ergonomics when debugging.FromStr
and Display
impls for SignedDuration
and Span
remain as they are today, for the reasons stated above.SignedDuration
and Span
. For example, Span::parse_friendly
and Span::to_friendly
, or something like that.Another thing I'm thinking about is:
FromStr
on SignedDuration
and Span
accept either an ISO 8601 duration or a "friendly" duration. I think this is possible because ISO 8601 durations have to start with a P
. A "friendly" duration has to start with either a sign or a digit. So there should be no ambiguity.Display
impl, e.g., {:#}
.If I did those two things, then we wouldn't need any new "friendly" APIs on Span
and SignedDuration
. But I'm not fully sold on that approach yet. It does feel a bit subtle.
Looking forward to this new feature and be able remove conversion functions and humantime
from our dependencies. Will this format be added to the serde
module too?
Yeah servicing the use cases that humantime
does (but with more correctness) is definitely the intent.
I haven't quite figured out how I'm going to do serde
integration yet. But yeah, I do expect we should do something there. The Deserialize
case will be easy because there isn't a ton of configuration there. But the Serialize
case is harder because there's tons of different ways to express a duration given the above grammar. I suppose we should at least provide a way to use the default format (whatever that ends up being, I guess probably something that humantime
can parse).
This issue is partially motivated by #60, but the bigger picture here is that the ISO 8601 duration kinda sucks. We should obviously still support it, and it will continue to be the "default" used when serializing a
Span
/SignedDuration
, but I think we can do better than that. The wide use of crates likehumantime
demonstrate this.I hunted around for a specification or seeming standard that Jiff could use, but one just doesn't exist. However, there's definitely a "common sense" sort of format that has organically developed in at least the Go and Rust ecosystems (and probably others). For example, the following are all valid
time.Duration
values from Go's standard librarytime
package:300ms
-1.5h
2h45m
humantime
almost supports these, but since it parses into astd::time::Duration
(which is unsigned), negative durations aren't supported. And it doesn't support fractional units:But
humantime
is a bit more flexible than what Go supports. For example, this program runs fine:What are the advantages of the above kinds of durations? I think there are two:
1 second 1000ms
should be perfectly valid. But you can't express that in ISO 8601 durations.So here's my first stab at a grammar for this new duration format:
I also need help picking a name for this format. "human" is not a bad name, but the above is specifically English and doesn't take localization into account, so calling it something as broadly applicable as "human" seems a bit narrow minded. Right now, my pick is "friendly." The "friendly" duration format. I don't love it though.