HowardHinnant / HowardHinnant.github.io

50 stars 6 forks source link

Days algorithms #9

Open Roman-Koshelev opened 3 years ago

Roman-Koshelev commented 3 years ago

Found your algorithms days_from_civil and civil_from_days (http://howardhinnant.github.io/date_algorithms.html). l saw their use in the hffix (https://github.com/jamesdbrock/hffix) and Microsoft STL (https://github.com/microsoft/STL) libraries. Based on the description on the site, it seems that this is the fastest of the existing implementations, but this is not the case. The gmtime implementation in musl libc (https://github.com/ifduyue/musl) is 40% more efficient. I think it is worth writing that there are more efficient implementations, and that it is worth using them.

HowardHinnant commented 3 years ago

Can you point more specifically to the algorithm? I found https://github.com/ifduyue/musl/blob/cfdfd5ea3ce14c6abf7fb22a531f3d99518b5a1b/src/time/__secs_to_tm.c, but I have doubts that is what you're referring to.

Roman-Koshelev commented 3 years ago

it's him.

HowardHinnant commented 3 years ago

You sure? I haven't timed it, but I see iteration in __secs_to_tm.

The next thing I'm trying to find is a claim that the algorithms on http://howardhinnant.github.io/date_algorithms.html are the fastest of the existing implementations. At http://howardhinnant.github.io/date_algorithms.html I see "fast", but not "fastest".

Also check out my stars page: https://github.com/HowardHinnant?tab=stars Take a look at anything by cassioneri. I'm pretty sure I have __secs_to_tm beat. And I know the work at cassioneri has mine beat. He does sacrifice some range, but nothing that the C++20 chrono spec doesn't allow. And he's gotten his implementation into the gcc C++20 implementation: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/chrono#L2448-L2513 It's really nice work.

HowardHinnant commented 3 years ago

I got around to doing some comparative timing between https://github.com/ifduyue/musl/blob/cfdfd5ea3ce14c6abf7fb22a531f3d99518b5a1b/src/time/__secs_to_tm.c and civil_from_days.

Problem: These two algorithms do different things.

So to do a fair comparison, I did it two ways:

  1. I wrote the equivalent of __secs_to_tm using date.h which uses civil_from_days under the hood.
  2. I stripped out the compuation of weekday, day-of-year, hour, minute and second from both function in the first comparison and re-compared.

Here is the first comparison:

#include <time.h>
#include <limits.h>

/* 2000-03-01 (mod 400 year, immediately after feb29 */
#define LEAPOCH (946684800LL + 86400*(31+29))

#define DAYS_PER_400Y (365*400 + 97)
#define DAYS_PER_100Y (365*100 + 24)
#define DAYS_PER_4Y   (365*4   + 1)

int __secs_to_tm(long long t, volatile tm *tm)
{
    long long days, secs, years;
    int remdays, remsecs, remyears;
    int qc_cycles, c_cycles, q_cycles;
    int months;
    int wday, yday, leap;
    static const char days_in_month[] = {31,30,31,30,31,31,30,31,30,31,31,29};

    // Reject time_t values whose year would overflow int 
    if (t < INT_MIN * 31622400LL || t > INT_MAX * 31622400LL)
        return -1;

    secs = t - LEAPOCH;
    days = secs / 86400;
    remsecs = secs % 86400;
    if (remsecs < 0) {
        remsecs += 86400;
        days--;
    }

    wday = (3+days)%7;
    if (wday < 0) wday += 7;

    qc_cycles = days / DAYS_PER_400Y;
    remdays = days % DAYS_PER_400Y;
    if (remdays < 0) {
        remdays += DAYS_PER_400Y;
        qc_cycles--;
    }

    c_cycles = remdays / DAYS_PER_100Y;
    if (c_cycles == 4) c_cycles--;
    remdays -= c_cycles * DAYS_PER_100Y;

    q_cycles = remdays / DAYS_PER_4Y;
    if (q_cycles == 25) q_cycles--;
    remdays -= q_cycles * DAYS_PER_4Y;

    remyears = remdays / 365;
    if (remyears == 4) remyears--;
    remdays -= remyears * 365;

    leap = !remyears && (q_cycles || !c_cycles);
    yday = remdays + 31 + 28 + leap;
    if (yday >= 365+leap) yday -= 365+leap;

    years = remyears + 4*q_cycles + 100*c_cycles + 400LL*qc_cycles;

    for (months=0; days_in_month[months] <= remdays; months++)
        remdays -= days_in_month[months];

    if (months >= 10) {
        months -= 12;
        years++;
    }

    if (years+100 > INT_MAX || years+100 < INT_MIN)
        return -1;

    tm->tm_year = years + 100;
    tm->tm_mon = months + 2;
    tm->tm_mday = remdays + 1;
    tm->tm_wday = wday;
    tm->tm_yday = yday;

    tm->tm_hour = remsecs / 3600;
    tm->tm_min = remsecs / 60 % 60;
    tm->tm_sec = remsecs % 60;

    return 0;
}

#include "date/date.h"

int
secs_to_tm(long long t, volatile tm *tm)
{
    using namespace date;
    using namespace std::chrono;

    constexpr sys_seconds min = sys_days{year::min()/1/1};
    constexpr sys_seconds max = sys_days{year::max()/12/31};
    sys_seconds tp{seconds{t}};
    if (tp < min || tp > max)
        return -1;
    auto tp_days = floor<days>(tp);
    year_month_day ymd = tp_days;

    tm->tm_year = int{ymd.year()} - 1900;
    tm->tm_mon = unsigned{ymd.month()} - 1;
    tm->tm_mday = unsigned{ymd.day()};
    tm->tm_wday = weekday{tp_days}.c_encoding();
    tm->tm_yday = (tp_days - sys_days{ymd.year()/1/1})/days{1};

    hh_mm_ss tod{tp - tp_days};
    tm->tm_hour = tod.hours().count();
    tm->tm_min = tod.minutes().count();
    tm->tm_sec = tod.seconds().count();;

    return 0;
}

#include <cassert>
#include <iostream>

bool
operator==(struct tm& x, struct tm& y)
{
    return x.tm_year == y.tm_year &&
           x.tm_mon == y.tm_mon &&
           x.tm_mday == y.tm_mday &&
           x.tm_wday == y.tm_wday &&
           x.tm_yday == y.tm_yday &&
           x.tm_hour == y.tm_hour &&
           x.tm_min == y.tm_min &&
           x.tm_sec == y.tm_sec;
}

int
main()
{
    using namespace date;
    using namespace std::chrono;
    auto constexpr m = (sys_days{year::min()/1/1} - sys_seconds{})/1s;
    auto constexpr M = (sys_days{year::max()/12/31} - sys_seconds{})/1s;
    int i = 0;
    int count = 0;
    auto tp0 = steady_clock::now();
    for (auto t = m; t <= M; t += 10'000)
    {
        struct tm tm1{};
        struct tm tm2{};
        assert(__secs_to_tm(t, &tm1) == secs_to_tm(t, &tm2));
        assert(tm1 == tm2);
        ++count;
    }
    auto tp1 = steady_clock::now();
    volatile tm tm1{};
    for (auto t = m; t <= M; t += 10'000)
    {
        i += __secs_to_tm(t, &tm1);
    }
    auto tp2 = steady_clock::now();
    volatile tm tm2{};
    for (auto t = m; t <= M; t += 10'000)
    {
        i += secs_to_tm(t, &tm2);
    }
    auto tp3 = steady_clock::now();
    using ps = duration<int64_t, std::pico>;
    std::cout << (tp1-tp0)/count << '\n';
    std::cout << (tp2-tp1)/count << '\n';
    std::cout << (tp3-tp2)/count << '\n';
    return i;
}

compiled with clang++ -03 on a 2013, 2.4 GHz Intel Core i5, MacBook Pro.

The test first warms everything up and ensures that both functions are getting identical results. Results are spot checked every 10,000s between -32767-01-01 and 32767-12-31. Then the test repeats for timing purposes, looping over each function. Care is used to ensure that the optimizer doesn't eliminate the subject being tested. Results are printed out in terms of nanoseconds per iteration.

My results for this test are:

59ns
28ns
29ns

Meaning the correctness test took 59ns, __secs_to_tm took 28ns and the date.h version took 29ns. __secs_to_tm performed much better than I thought it would!

Here's what got commented out for part 2: Just compute year, month and day:

#include <time.h>
#include <limits.h>

/* 2000-03-01 (mod 400 year, immediately after feb29 */
#define LEAPOCH (946684800LL + 86400*(31+29))

#define DAYS_PER_400Y (365*400 + 97)
#define DAYS_PER_100Y (365*100 + 24)
#define DAYS_PER_4Y   (365*4   + 1)

int __secs_to_tm(long long t, volatile tm *tm)
{
    long long days, secs, years;
    int remdays, remsecs, remyears;
    int qc_cycles, c_cycles, q_cycles;
    int months;
    int wday, yday, leap;
    static const char days_in_month[] = {31,30,31,30,31,31,30,31,30,31,31,29};

    // Reject time_t values whose year would overflow int 
    if (t < INT_MIN * 31622400LL || t > INT_MAX * 31622400LL)
        return -1;

    secs = t - LEAPOCH;
    days = secs / 86400;
    remsecs = secs % 86400;
    if (remsecs < 0) {
//         remsecs += 86400;
        days--;
    }

//     wday = (3+days)%7;
//     if (wday < 0) wday += 7;

    qc_cycles = days / DAYS_PER_400Y;
    remdays = days % DAYS_PER_400Y;
    if (remdays < 0) {
        remdays += DAYS_PER_400Y;
        qc_cycles--;
    }

    c_cycles = remdays / DAYS_PER_100Y;
    if (c_cycles == 4) c_cycles--;
    remdays -= c_cycles * DAYS_PER_100Y;

    q_cycles = remdays / DAYS_PER_4Y;
    if (q_cycles == 25) q_cycles--;
    remdays -= q_cycles * DAYS_PER_4Y;

    remyears = remdays / 365;
    if (remyears == 4) remyears--;
    remdays -= remyears * 365;

//     leap = !remyears && (q_cycles || !c_cycles);
//     yday = remdays + 31 + 28 + leap;
//     if (yday >= 365+leap) yday -= 365+leap;

    years = remyears + 4*q_cycles + 100*c_cycles + 400LL*qc_cycles;

    for (months=0; days_in_month[months] <= remdays; months++)
        remdays -= days_in_month[months];

    if (months >= 10) {
        months -= 12;
        years++;
    }

    if (years+100 > INT_MAX || years+100 < INT_MIN)
        return -1;

    tm->tm_year = years + 100;
    tm->tm_mon = months + 2;
    tm->tm_mday = remdays + 1;
//     tm->tm_wday = wday;
//     tm->tm_yday = yday;
// 
//     tm->tm_hour = remsecs / 3600;
//     tm->tm_min = remsecs / 60 % 60;
//     tm->tm_sec = remsecs % 60;

    return 0;
}

#include "date/date.h"

int
secs_to_tm(long long t, volatile tm *tm)
{
    using namespace date;
    using namespace std::chrono;

    constexpr sys_seconds min = sys_days{year::min()/1/1};
    constexpr sys_seconds max = sys_days{year::max()/12/31};
    sys_seconds tp{seconds{t}};
    if (tp < min || tp > max)
        return -1;
    auto tp_days = floor<days>(tp);
    year_month_day ymd = tp_days;

    tm->tm_year = int{ymd.year()} - 1900;
    tm->tm_mon = unsigned{ymd.month()} - 1;
    tm->tm_mday = unsigned{ymd.day()};
//     tm->tm_wday = weekday{tp_days}.c_encoding();
//     tm->tm_yday = (tp_days - sys_days{ymd.year()/1/1})/days{1};
// 
//     hh_mm_ss tod{tp - tp_days};
//     tm->tm_hour = tod.hours().count();
//     tm->tm_min = tod.minutes().count();
//     tm->tm_sec = tod.seconds().count();;

    return 0;
}

And now the results are:

41ns
21ns
16ns
yangbowen commented 1 year ago

I suppose that the most major bottleneck could be division, since division is significantly slower than addition, subtraction, multiplication. The musl __secs_to_tm uses a lookup table instead of doing integer arithmetic to find out the month.

Here's some of my thoughts:

  1. Tuning the constants so that the divisor is power-of-2 may be beneficial, since the division can be optimized to a bitshift. I used an Excel spreadsheet to search for candidates and found the following:

    // days from month
    (1958 * (m > 2 ? m - 3 : m + 9) + 35) / 64;
    // month from days
    (535 * doy + 333) / 16384;

    image

  2. __secs_to_tm uses linear search in the lookup table, perhaps a binary search can be used instead?

    {
    static const int days_month_end[] = {31,61,92,122,153,184,214,245,275,306,337,366};
    int month_lbound = 0, month_rbound = 11;
    while (month_lbound < month_rbound) {
        month_middle = (month_lbound + month_rbound) / 2;
        int days_middle = days_month_accumulate[month_middle];
        if (remdays >= days_middle) {
            month_lbound = month_middle + 1;
        } else {
            month_rbound = month_middle;
        }
    }
    month = month_lbound;
    remdays -= days_in_month[month];
    }
jbrower2 commented 11 months ago

yeah I confirmed @yangbowen's formulae, and improved the second one slightly. here are what i believe are the most optimized versions:

(535 * doy + 332) >> 14
(979 * mp + 16) >> 5
HowardHinnant commented 11 months ago

See this repository for some impressive work in this area: https://github.com/cassioneri/eaf