Open BurntSushi opened 3 weeks ago
Loving this! Nushell does this type of thing and just calls them durations in the repl.
I've also written something kind of like what nushell does as an output for jiff of a date diff command that I'm writing for nushell.
❯ '2019-05-10T09:59:12-07:00[-07:00]' | dt diff (dt now)
P5y3m12dT26m1.5374272s
5yrs 3mths 1wks 5days 26mins 1secs 537ms 427µs 200ns
The first output line is just for debugging.
Personally, I'd like to see jiff have more synonyms for datetime. Not saying this is perfect but this is where I've landed so far. (maybe I should use some upper case letters too to differentiate things like months and minutes)
pub fn get_unit_from_unit_string(unit_name: String) -> Result<Unit, LabeledError> {
let unit = match unit_name.as_ref() {
"year" | "years" | "yyyy" | "yy" | "yr" | "yrs" => Ok(Unit::Year),
"month" | "months" | "mth" | "mths" | "mm" | "m" | "mon" => Ok(Unit::Month),
"day" | "days" | "dd" | "d" => Ok(Unit::Day),
"week" | "weeks" | "ww" | "wk" | "wks" | "iso_week" | "isowk" | "isoww" => Ok(Unit::Week),
"hour" | "hours" | "hh" | "hr" | "hrs" => Ok(Unit::Hour),
"minute" | "minutes" | "mi" | "n" | "min" | "mins" => Ok(Unit::Minute),
"second" | "seconds" | "ss" | "s" | "sec" | "secs" => Ok(Unit::Second),
"millisecond" | "ms" | "millis" => Ok(Unit::Millisecond),
"microsecond" | "mcs" | "us" | "micros" => Ok(Unit::Microsecond),
"nanosecond" | "ns" | "nano" | "nanos" => Ok(Unit::Nanosecond),
_ => {
return Err(LabeledError::new(
"please supply a valid unit name to extract from a date/datetime. see dt part --list for list of abbreviations.",
))
}
};
unit
}
Also note that I'm forcing weeks here, but some may find it odd, as we've discussed prior. For my code here, I'm dictating a standard abbreviation, but when parsing (above) I try to be more forgiving and allow anything that could reasonably be considered as an abbreviation.
fn create_nushelly_duration_string(span: jiff::Span) -> String {
let mut span_vec = vec![];
if span.get_years() > 0 {
span_vec.push(format!("{}yrs", span.get_years()));
}
if span.get_months() > 0 {
span_vec.push(format!("{}mths", span.get_months()));
}
// if we have more than 6 days, show weeks
let days_span = span.get_days();
if days_span > 6 {
let weeks = span.get_weeks();
if weeks == 0 {
let (weeks, days) = (days_span / 7, days_span % 7);
span_vec.push(format!("{}wks", weeks));
if days > 0 {
span_vec.push(format!("{}days", days));
}
} else if span.get_days() > 0 {
span_vec.push(format!("{}days", span.get_days()));
}
} else if span.get_days() > 0 {
span_vec.push(format!("{}days", span.get_days()));
}
if span.get_hours() > 0 {
span_vec.push(format!("{}hrs", span.get_hours()));
}
if span.get_minutes() > 0 {
span_vec.push(format!("{}mins", span.get_minutes()));
}
if span.get_seconds() > 0 {
span_vec.push(format!("{}secs", span.get_seconds()));
}
if span.get_milliseconds() > 0 {
span_vec.push(format!("{}ms", span.get_milliseconds()));
}
if span.get_microseconds() > 0 {
span_vec.push(format!("{}µs", span.get_microseconds()));
}
if span.get_nanoseconds() > 0 {
span_vec.push(format!("{}ns", span.get_nanoseconds()));
}
span_vec.join(" ").trim().to_string()
}
We "borrowed" liberally from chrono-humanize-rs as inspiration and kind of rolled our own but we also support a date humanize
command.
❯ '2019-05-10T09:59:12-07:00' | date humanize
5 years ago
❯ (date now) - 2019-05-10T09:59:12-07:00
275wk 5day 22hr 34min 20sec 879ms 101µs
So, any duration automatically is expressed in a nushell-humanized nomenclature. I'm not satisfied with it, or our other datetime handling, which is why I'm here in this repo. 😄
I think friendly
or human
, humantime
, humanize
all express viable naming options.
@fdncred Thanks for the feedback! I'm not sure about some of those unit designators. Something like 5 yyyy
reads very weird to me hah. But there are some in there that probably make sense to add, like wks
.
Out of curiosity, how does nushell deal with locale? That is honestly my biggest hesitation with something like this. ISO 8601's duration format and Temporal's ISO 8601 datetime format are interchange formats. They are "human readable," but their primary purpose is in the exchange of data in an agreed upon format between different systems. Otherwise, Temporal punts internationalization to another TC39 group. This is also why, AIUI, Temporal doesn't support something like strptime
. And AIUI, strftime
is insufficient for correct internationalization. But I added strptime
and strftime
because they are just so incredibly useful. That's also why I want to add this new "friendly" format as well, because in practice, folks just want an easy way of accepting and printing durations that humans can easily interact with. Assuming you're find assuming English and a Western/Gregorian understanding of date-keeping.
On the flip side, I don't necessarily want to tie one hand behind my back and resist offering useful APIs for English speaking folks using the Gregorian calendar just because they aren't universally applicable. And on the other flip side, I don't want to go down the enormous rabbit hole of internationalization either. It's just not a problem I want to spend the next ~months/years of my life working on. And still yet, I don't want to do a "half baked" solution where you can just configure the strings used for "year" or whatever.
One thing I tend to fall back on here is that ISO 8601 itself encodes the notion of a Gregorian calendar and Western time keeping. So by doubling down on English unit names, we aren't really do anything more than what ISO 8601 already assumes. But this "friendy" format is stretching it a little bit by allowing spelled out words like "year" instead of just the Y
designator.
Something like 5 yyyy reads very weird to me hah
lolz! I mainly like just covering the sql abbreviations, singular and plurals, and then some really abbreviated variations
Out of curiosity, how does nushell deal with locale?
As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.
internationalization
Big ugh! Lots of work there. For nushell, rightly or wrongly, we've standardized on English. We have contributors across the world, and no one really complains about it. We do have some strange issues sometimes from Chinese folks, from time to time, but that's usually due to bytes/chars/grapheme stuff.
I think your ISO 8601 argument is valid and I personally see "friendly" easily falling into that without issue or complaint.
I have two pieces of unsolicited friendly advice to you on your "half baked" comments. (not that your comments are half baked but the comments where you mention half-baked, lol) They have kind of become my mantra with nushell.
As we've discussed prior, I'm not a fan of how nushell handles datetime. However, I tend to think about it this way. When a user is expressing a date/datetime in the repl, they're usually subconsciously thinking about their own locale/time zone. People don't usually think about time in anyone else's time zone. However, when they do, they can provide the offset. Once they provide an offset, it's much clearer what they're saying. So, we either assume local, or assume their provided offset.
Oh, sorry, I meant locale as in internationalization. In Temporal's case, there is nothing in its API that lets you print or parse strings like July
or Saturday
. That's where Jiff diverges from Temporal, although only a very small corner of Jiff does this. The vast majority of Jiff is "independent" of internationalization problems and this was very much intentional. But the strtime
APIs are an example of where we start heading into areas that internationalization might have a role to play. And this "friendly" format doubles down and extends it more.
And oh yeah, I've been uttering "don't let perfection be the enemy of the good" for a long time now. I'm confident I won't fall into that trap. But I also want to be sensitive to bias and implicit power structures. If I'm building a datetime library and that datetime library gets popular, but it specifically encourages English in places and eschews all other languages and calendar systems, then how big of a problem is that? Anyway, I don't mean to say that this must prevent me from moving forward with formats like this, but that it's something that's on my mind that I weigh as a trade-off.
I'd love advice from an internationalization expert to be honest. In particular about the trade-offs involved. Like, Jiff won't be the first library to offer a "human friendly" duration format like this. Others have done it before in different ways. How big of an issue has it been if it doesn't support internationalization? And if we did want to support internationalization, is it sufficient to "just" provide a way to override the unit designators with different strings? (I think it isn't, but I don't really know for sure.) Or is that something that helps a lot and doesn't hurt? Or is it a half-baked thing that actually ends up hurting more than it helps?
Oh, sorry, I meant locale as in internationalization.
Oops. My fault. I have a fixation on datetime atm. With nushell, I think the only thing we do with localization is determine what type of separator to use for thousands and decimals. Everything else is just English, iirc. If crates we use support strftime things with localization we don't interrupt it.
How hard is it? I'm no expert, but just figuring out the locale and changing the minimal things we did for decimals and thousands separators was a pain. Edge cases just drive me crazy sometimes. You think you have it right, but some funky thing breaks your software. Ugh!
How hard is it?
That's part of the problem. My understanding is that full and correct support (in accordance with relevant Unicode specs) is an enormous under-taking. And so it's best left to crates like icu
to do it. But there are in theory things I could do that aren't as complicated, like allow the caller to substitute in their own unit designator strings. But I don't know if that's helpful or actively harmful.
I was just going to ask about humantime
support but I see you are already on it :) Thanks.
PS: love the rount(Unit)
BTW.
Andrew, you may wish to take inspiration from hifitime here because our parser of durations is no-std (using the lexical crate), and supports rounding based on the unit: https://github.com/nyx-space/hifitime/blob/master/src/duration/parse.rs#L15 .
Note that I'm currently working to support Durations down to the zeptosecond (1e-21) for hifitime, so this functionality is bound to be expanded upon.
@ChristopherRabotin That appears to be just about a subset of the grammar I wrote above. I see only a few main differences making it not a proper subset:
:
, your format interprets an integer, I think, as a number of seconds. I intentionally decided to not support that in the grammar above due to the ambiguity it can cause. The [-+]H:M:S(.[0-9]+)?
is supported in the grammar above though.1.5 hours 30.5 minutes
isn't allowed by the grammar above.Otherwise, my grammar above also supports units of years, months and weeks.
The closest analog to hifitime's Duration
type is probably Jiff's SignedDuration
, and in that context, Jiff specifically forbids the concept of "days" because days are non-uniform units in the context of time zones. To express "days," Jiff requires you to use a Span
, which is more like a bag of units instead of one single absolute duration.
This issue is partially motivated by #60, but the bigger picture here is that the ISO 8601 duration kinda sucks. We should obviously still support it, and it will continue to be the "default" used when serializing a
Span
/SignedDuration
, but I think we can do better than that. The wide use of crates likehumantime
demonstrate this.I hunted around for a specification or seeming standard that Jiff could use, but one just doesn't exist. However, there's definitely a "common sense" sort of format that has organically developed in at least the Go and Rust ecosystems (and probably others). For example, the following are all valid
time.Duration
values from Go's standard librarytime
package:300ms
-1.5h
2h45m
humantime
almost supports these, but since it parses into astd::time::Duration
(which is unsigned), negative durations aren't supported. And it doesn't support fractional units:But
humantime
is a bit more flexible than what Go supports. For example, this program runs fine:What are the advantages of the above kinds of durations? I think there are two:
1 second 1000ms
should be perfectly valid. But you can't express that in ISO 8601 durations.So here's my first stab at a grammar for this new duration format:
I also need help picking a name for this format. "human" is not a bad name, but the above is specifically English and doesn't take localization into account, so calling it something as broadly applicable as "human" seems a bit narrow minded. Right now, my pick is "friendly." The "friendly" duration format. I don't love it though.