OpenHistoricalMap / issues

File your issues here, regardless of repo until we get all our repos squared away; we don't want to miss anything.
Creative Commons Zero v1.0 Universal
17 stars 1 forks source link

Simplify decimal date conversion #562

Open 1ec5 opened 1 year ago

1ec5 commented 1 year ago

At least two OpenHistoricalMap projects implement functions to convert between ISO 8601-1 strings and decimal years. A A PL/pgSQL function is used when generating tiles, and a JavaScript function is part of the timeslider Leaflet plugin. While I don’t doubt that these functions are correctly implemented, they’re quite involved.

Recently, I implemented a different approach in JavaScript. The algorithm is simple:

  1. Parse the ISO 8601-1 string as a date object.
  2. Use the date’s year to create a date object for the previous and next New Year’s Days.
  3. Convert all three date objects to a number (i.e., milliseconds since the epoch).
  4. Get the difference between the given date and the previous New Year’s Day and the difference between the two New Year’s Days.
  5. Get the proportion between these two differences and add it to the given year.

It ends up being four lines of code in JavaScript, without any need for special cases around leap years and such. It should be similar in PL/pgSQL with the to_number() and to_date() functions.

I think it would be nice to streamline these implementations to rely more heavily on these built-in date conversion utilities. Inevitably, someone will need similar functionality in a language besides PL/pgSQL or outside of Leaflet; they would need to be careful when porting the code to avoid the many pitfalls when parsing dates manually. I also suspect that this simpler algorithm is slightly faster in most languages, but probably not by a significant amount.

1ec5 commented 1 year ago

@gregallensworth pointed out a major flaw in the proposed algorithm as implemented in JavaScript. The Date() constructor conforms to the requirement in ISO 8601-1 that the year be exactly four digits, so years before 1 CE and after 9999 CE are unsupported, and years before 1000 CE need to be zero-padded (which I think we’re more lenient about).

The workaround is to set each date component individually, although it bloats the code beyond the four lines that I advertised above:

/**
 * Converts the given ISO 8601-1 date to a decimal year.
 * 
 * @param isoDate A date string in ISO 8601-1 format.
 * @returns A floating point number of years since year 0.
 */
function decimalYearFromISODate(isoDate) {
    // Require a valid YYYY, YYYY-MM, or YYYY-MM-DD date, but allow the year
    // to be a variable number of digits or negative, unlike ISO 8601-1.
    if (!isoDate || !/^-?\d{1,4}(?:-\d\d){0,2}$/.test(isoDate)) return;

    var ymd = isoDate.split("-");
    // A negative year results in an extra element at the beginning.
    if (ymd[0] === "") {
        ymd.shift();
        ymd[0] *= -1;
    }
    var year = +ymd[0];
    var date = dateFromUTC(year, +ymd[1] - 1, +ymd[2]);
    if (isNaN(date)) return;

    // Add the year and the fraction of the date between two New Year’s Days.
    var nextNewYear = dateFromUTC(year + 1, 0, 1).getTime();
    var lastNewYear = dateFromUTC(year, 0, 1).getTime();
    return year + (date.getTime() - lastNewYear) / (nextNewYear - lastNewYear);
}

/**
 * Returns a `Date` object representing the given UTC date components.
 * 
 * @param year A one-based year in the proleptic Gregorian calendar.
 * @param month A zero-based month.
 * @param day A one-based day.
 * @returns A date object.
 */
function dateFromUTC(year, month, day) {
    var date = new Date(Date.UTC(year, month, day));
    // Date.UTC() treats a two-digit year as an offset from 1900.
    date.setUTCFullYear(year);
    return date;
}

This workaround should be unnecessary in PL/pgSQL, since to_date() lets you specify the input date format and claims to support negative years.