HowardHinnant / date

A date and time library based on the C++11/14/17 <chrono> header
Other
3.14k stars 677 forks source link

from_stream and to_stream performance: correct way to test? #703

Closed oschonrock closed 3 years ago

oschonrock commented 3 years ago

I am currently using fmt and date as separate libs with clang 12, as c++20 support is not there yet. Initial testing shows that the parsing and formatting functions may be a performance bottleneck for us - particularly to_stream -- using %F rather than %Y-%m-%d gave a surprising 2x boost to our inner loop but concerns remain.

I understand that c++20 will replace date::to_stream with std::format and a full set of <chrono> specifiers (not just for std::tm as in current fmt lib).

Is this correct? Is there any combination of libs available to test this? Or do we need to wait for the compiler vendor implementations to tie these 2 libs together? Especially to validate the performance..?

HowardHinnant commented 3 years ago

Correct. C++20 incorporates both fmt and date. You might try getting the chrono stuff from fmt. I know he's got some chrono support, but I'm not sure how much.

Also, if your needs are simple, you might try writing a custom streaming operator, still using date for the calendrical arithmetic, but just writing to a string or character buffer yourself. If your format string is fixed, and thus you aren't having to parse one, it is really not that much coding and can get very fast.

Here's an example:

#include "date/date.h"
#include <chrono>
#include <iostream>
#include <string>

void
stamp(char* s, int i)
{
    do
    {
        *s-- = char(i % 10) + '0';
        i /= 10;
    } while (i > 0);
}

void GenerateUTCTimestamp(std::string& out)
{
    using namespace date;
    using namespace std::chrono;

    auto now = floor<seconds>(system_clock::now());
    auto today = floor<days>(now);
    hh_mm_ss hms{now - today};
    year_month_day ymd = today;

    // format yyyy-mm-dd hh:MM:ss
    out = "0000-00-00 00:00:00";
    //     0123456789012345678
    stamp(&out[3], int{ymd.year()});
    stamp(&out[6], unsigned{ymd.month()});
    stamp(&out[9], unsigned{ymd.day()});
    stamp(&out[12], hms.hours().count());
    stamp(&out[15], hms.minutes().count());
    stamp(&out[18], hms.seconds().count());
}

int
main()
{
    std::string s;
    GenerateUTCTimestamp(s);
    std::cout << s << '\n';
}
oschonrock commented 3 years ago

Yes. I suspect the little bit of maths in YMD is not the issue. It's the format parsing and dynamic fiddling.

The scary part for me was, that php's (don't ask why that is my comparison) DateTime::format("Y-m-d", $dt) seemed to be "as fast or faster" than date::to_stream(os, "%Y-%m-%d", tp).

Caveat: It wasn't a good benchmark test, I need to isolate it more. Hence the question of more tightly integrated c++20 format and date libs, before I spend more time benchmarking it.

But good to know there are always options to "go lower".

Closing for now. Will reopen if I find better evidence.

oschonrock commented 3 years ago

@HowardHinnant Your code flies! (estimate 10x or more speed up over date::to_stream) Unsurprisingly. I adapted it to do the 2 date formats I need to output (they are the DATE and DATETIME formats of from mysql and mariadb).

I also wrote a similar routine for the reverse process of parsing in these formats. Also flies! (also estimated at a 10x or more speedup over date:from_stream)

Code if useful for anyone:

namespace impl {
// render integer value into buffer pre-filled with '0'
// doesn't work for negatives, but uses long for convenient interoperability
inline void stamp(char* s, long i) {
  do {
    *s-- = char(i % 10) + '0'; // NOLINT ptr arith
    i /= 10;
  } while (i > 0);
}
} // namespace impl

// much faster date format function "YYYY-MM-DD HH:MM:SS" (credit Howard Hinnant)
template <typename TimePointType>
std::string format_time_point(TimePointType tp) {
  static_assert(std::is_same_v<TimePointType, date::sys_days> ||
                    std::is_same_v<TimePointType, date::sys_seconds>,
                "do not know how to format this TimePointType");

  auto today = floor<date::days>(tp);

  using impl::stamp;
  //                 YYYY-MM-DD
  std::string out = "0000-00-00";
  //                 0123456789
  if constexpr (std::is_same_v<TimePointType, date::sys_seconds>) {
    //     YYYY-MM-DD hh:mm:ss
    out = "0000-00-00 00:00:00";
    //     0123456789012345678
    date::hh_mm_ss hms{tp - today};
    stamp(&out[12], hms.hours().count());
    stamp(&out[15], hms.minutes().count());
    stamp(&out[18], hms.seconds().count());
  }

  date::year_month_day ymd = today;
  stamp(&out[3], int{ymd.year()});
  stamp(&out[6], unsigned{ymd.month()});
  stamp(&out[9], unsigned{ymd.day()});

  return out;
}

// adapted from fmt::detail
template <typename ReturnType>
constexpr ReturnType parse_nonnegative_int(const char* begin, const char* end,
                                           ReturnType error_value) noexcept {

  assert(begin != end && '0' <= *begin && *begin <= '9'); // NOLINT decay, can't suppress??
  unsigned    value = 0;
  unsigned    prev  = 0;
  const char* p     = begin;
  do {
    prev  = value;
    value = value * 10 + unsigned(*p - '0');
    ++p;
  } while (p != end && '0' <= *p && *p <= '9');
  auto num_digits = p - begin;
  if (num_digits <= std::numeric_limits<ReturnType>::digits10)
    return static_cast<ReturnType>(value);
  // Check for overflow. Will never happen here
  const auto max = static_cast<std::uint64_t>(std::numeric_limits<ReturnType>::max());
  return num_digits == std::numeric_limits<int>::digits10 + 1 &&
                 prev * 10ULL + std::uint64_t(p[-1] - '0') <= max
             ? static_cast<ReturnType>(value)
             : error_value;
}

// high peformance mysql date format parising
template <typename TimePointType>
TimePointType parse_date_time(const char* s) {
  static_assert(std::is_same_v<TimePointType, date::sys_days> ||
                std::is_same_v<TimePointType, date::sys_seconds>,
                "don't know how to parse this timepoint");

  using date::year, date::month, date::day, std::chrono::hours, std::chrono::minutes,
      std::chrono::seconds;

  // fmt   YYYY-MM-DD HH:MM:SS
  //       0123456789012345678

  date::year_month_day ymd = {year(parse_nonnegative_int(&s[0], &s[4], -1)),
                              month(parse_nonnegative_int(&s[5], &s[7], ~0U)),
                              day(parse_nonnegative_int(&s[8], &s[10], ~0U))};

  if constexpr (std::is_same_v<TimePointType, date::sys_days>) {
    return date::sys_days{ymd}; // no validty check on date, because it's coming from db
  } else {
    auto hms = hours(parse_nonnegative_int(&s[11], &s[13], ~0U)) +
               minutes(parse_nonnegative_int(&s[14], &s[16], ~0U)) +
               seconds(parse_nonnegative_int(&s[17], &s[19], ~0U));

    return date::sys_days{ymd} + hms; // no validty check on date/time, because it's coming from db
  }
}