Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.91k stars 437 forks source link

Some TimerTrigger functions seem to fire an hour late after DST change while using WEBSITE_TIME_ZONE #2546

Closed npiasecki closed 6 years ago

npiasecki commented 6 years ago

(originally posted at https://github.com/Azure/azure-functions-host/issues/1534 and opened as a separate issue by request)

I had a function today that didn't run at its scheduled time of 6 p.m. Eastern time. It's been running fine for many months. WEBSITE_TIME_ZONE is set to Eastern Standard Time.

Site: https://fn-sv-bills-prod.azurewebsites.net/ Function Name: IssueInvoices CRON expression: 0 0 18 5 Expected Invocation: 2018-03-16T18:00:00-04:00 (2018-03-16T22:00:00Z)

When I logged into the portal, I ran the function manually and it worked (that's the invocation at 2018-03-16T18:23:09.0252356-04:00). Curious, I restarted the site via the portal. A few minutes later, Functions seemed to have realized it had missed the previously scheduled invocation, and ran it again at 2018-03-16T18:28:06.8759704-04:00.

I added WEBSITE_USE_PLACEHOLDER in attempt to prevent it from happening again, though I'm not sure if the defect @davidebbo mentioned above with WEBSITE_TIME_ZONE is still at play here.

As an aside, in a separate functions app, I have a function that normally runs at 5 p.m. Eastern every weekday:

Site: https://fn-sv-labels-prod.azurewebsites.net/ Function Name: DispatchVolumePostageReport CRON expression: 0 0 17 1-5

On Monday, it executed at 6 p.m. instead of 5 p.m., an hour later than usual. On Tuesday, it was back on its 5 p.m. schedule, and I hadn't made any changes.

What has me curious is that the next invocation following the switch to Daylight Saving Time was an hour late, but after that next invocation got itself back on track. I almost wonder if my IssueInvoices job that didn't run as expected also would have been an hour late since it would have been the first invocation after the DST change. However, I'm not sure how Azure is scheduling and keeping track of the next invocation internally for that to be a possible source of a bug?

davidebbo commented 6 years ago

I think this is probably a bug related to Daylight Saving transition. Something to do with the fact that the next occurrence was scheduled before the transition, and things get confused. Normally, that would not affect apps because the default time zone is UTC, but here the local time zone adds a level of complexity.

@mathewc this is the one we chatted about.

npiasecki commented 6 years ago

I played around some more and I think we've hit the nail on the head.

Looking at the StartTimer() function in TimerListener, I see that the host basically sets a timer to wait for the next occurrence:

    private void StartTimer(DateTime now)
    {
        // We need to calculate the next interval based on the current
        // time as we don't know how long the previous function invocation took.
        // Example: if you have an hourly timer invoked at 12:00 and the invocation takes 1 minute,
        // we want to calculate the interval for the next timer using 12:01 rather than at 12:00.
        // Otherwise, you'd start a 1-hour timer at 12:01 when we really want it to be a 59-minute timer.

        // If the interval happens to be negative (due to slow storage, for example), adjust the
        // interval back up 1 Tick (Zero is invalid for a timer) for an immediate invocation.
        TimeSpan nextInterval = ScheduleStatus.Next - now;
        if (nextInterval <= TimeSpan.Zero)
        {
            nextInterval = TimeSpan.FromTicks(1);
        }
        StartTimer(nextInterval);
    }

The code looks harmless enough, so I spun a simple test program to span the DST change, like my function that only runs on Fridays:

    static void Main(string[] args)
    {
        var now = new DateTime(2018, 3, 9, 18, 0, 0, DateTimeKind.Local);
        var schedule = CrontabSchedule.Parse("0 0 18 * * 5", new CrontabSchedule.ParseOptions() { IncludingSeconds = true });
        var next = schedule.GetNextOccurrence(now);
        var sleep = next - now;
        Trace.Assert(sleep.TotalHours == 167, "One week is normally 168 hours, but it's 167 hours across DST");
    }

Sure enough, subtracting the two DateTimes yields a sleep time of 168 hours (the number of hours in a normal week), instead of the 167 (the number of hours last week actually had), explaining why the functions are getting invoked an hour late.

That DateTime behaves this way is somewhat astonishing, but others on StackOverflow have apparently been bitten before. Certainly if I were writing this code, I would have been bitten by it, too.

Because I have other functions that execute each minute, my host was never spun down, so I actually was sitting on that same timer all week, even on a consumption plan. This explains why I saw the behavior from both function apps, even though one is hosted in an App Service Plan and one is hosted in a consumption plan.

What confused me is that those functions that have hourly executions weren't delayed by an hour during the DST change. What's up with that? So I changed my test program to execute hourly on the 59th minute. At 1:59 a.m. on March 11th, NCrontab's GetNextOccurrence() actually yields {3/11/2018 2:59:00 AM}. Subtracting the two times gives a sleep time of an hour, so the function actually still runs on time. But {3/11/2018 2:59:00 AM} isn't actually a valid DateTime, it's in the hour that doesn't exist!

    static void Main(string[] args)
    {
        var now = new DateTime(2018, 3, 11, 1, 59, 0, DateTimeKind.Local);
        var schedule = CrontabSchedule.Parse("0 59 * * * *", new CrontabSchedule.ParseOptions() { IncludingSeconds = true });
        var next = schedule.GetNextOccurrence(now);
        var isInvalid = TimeZoneInfo.Local.IsInvalidTime(next);
        Trace.Assert(!isInvalid, "I didn't expect an invalid time!");
    }

On one hand, it is easy to wave this problem away and say "just use UTC for everything, dude". And yes, I even have a 24-hour Zulutime clock hanging in my office (having become tired of, over the years, reading an alert message from a database and been like "um, is 21:40 UTC like right now and are we 4 or 5 hours off because of DST, I can never remember which way it swings", so now I can just look up at the wall instead of doing math in my head). But in our warehouse business, there are certain things that just happen "at the end of the business day": close reports get cut at 5 p.m. so that the EDI files containing all that customs data gets transmitted before the packages actually make it to the terminal (and not at 4 p.m., too soon, and not at 6 p.m., too late), and invoices get cut at 6 p.m. on Fridays (this is less critical, except people don't like it when their routine moves around).

So it would be nice if this could be fixed. One way would be to use a library that seems to understand local time zones, like Noda Time. I used to use Quartz.Net for years in a Windows service (I've been slowly porting these to Azure Functions and moving to PaaS), and it seems to use a different strategy: it just records the next UTC time, and its scheduler just runs in a loop, and if nothing is scheduled to fire, it just appears to sleep for about 30 seconds (it randomizes the exact value slightly, not sure why) instead of trying to actually do the math. I'm not sure of the performance implications of doing it that way on Azure scale, but it is much simpler than having to care about the math.

Hope this helps, and the obligatory xkcd:

Supervillain Plan

Clockwork-Muse commented 6 years ago

One way would be to use a library that seems to understand local time zones, like Noda Time.

The real problem is because DateTimeKind exists. The values it holds makes everybody think Local actually has knowledge of a timezone, but the behavior it usually evidences (ignoring the to/from UTC operations) is really more equivalent to Noda's LocalDateTime, rather than ZonedDateTime as most people expect (although you can't add year/month/day to the latter, which would be useful). It doesn't help that the majority of programmers seem to act as if UTC was somehow "not a timezone" - it is.

If you're staying core C#, DateTimeOffset will not help you either (since the timezone would be the same). In fact, from a quick look, even using TimeZoneInfo.Convert(...) is going to give you problems. C# does not, natively, have a way to say "get me the offset in effect for a given time in the zone with this id" - you either start with a potentially invalid value, or an unknown one. There's also no way to say "add 1 tick and given me the new timestamp, with the (potentially new) offset in effect at that time", which is what the other problem is.

On one hand, it is easy to wave this problem away and say "just use UTC for everything, dude".

Logged items are, pedantically, not really in any timezone (ie, Noda's Instant class), which is probably where most of the "use UTC" stuff originated from. For scheduling, UTC shouldn't be chosen just because it's UTC; you're going to want the timezone of whatever business unit you're dealing with - which, if your business has chosen to adopt it, might be UTC - and is fraught with danger the farther in advance something is scheduled (<24h is probably okay, but countries have been known to change timezones/DST rules with less than 2 week notice - and accounting for this in serialized scheduled times is its own can of worms).

xkcd

For those of us who've been dealing with date/time issues more in depth, and the realities of relative/civil time.... I would shrug at all that. Black Hat's plan wouldn't cause any additional (immediate) chaos by pushing California out to +9. If there was any chaos, you'd have to blame the California legislature for mishandling a transition to a new official timezone.


My recommendation would be to adopt a sane date/time library for this. To my somewhat limited knowledge, NodaTime is the only/major player in C#. There's been some rumblings in corefx, but nothing anytime soon, and how far it goes remains to be seen.

davidebbo commented 6 years ago

@npiasecki & @Clockwork-Muse thanks for the investigation. /cc @mathewc who owns this logic in the Functions Runtime.

@npiasecki Your sample code does a great job of making the issue clear. It's unfortunate that such 'obvious' code ends up being incorrect. I guess we should start by finding a way to solve the 4 line Console app, which would then presumably apply easily to the full runtime.

As an alternative, you can move to Florida :)

Clockwork-Muse commented 6 years ago

@npiasecki - For your specific use-case, I'd probably set up a timer function that procs every hour, reads a table, then posts to (the equivalent of) the current function. (For that matter, there may be an existing Azure package for that)

Consider: what happens if you get a second warehouse, in a different timezone? You'd probably need a differently scheduled job - but setting up individual functions for each individual warehouse strikes me as strange. If you read a table with the requisite data (scheduled time and zone - assuming you were using a library able to handle it) - you could just have it submit the relevant jobs as required.

npiasecki commented 6 years ago

@Clockwork-Muse You are right, but having moved warehouses once, I will be retired before we get a second location. If we had to support multiple locations your option makes total sense, but like any small business I am making trade-offs between "do I really need this perfect" or "just run this at 5 p.m. and call me in ten years if we need to change it".

I recognize my use case is somewhat specific as I would imagine most timer Functions (as are my own) are of the "every X minutes" or "every Y hours" variety, but I still have a few that run at a specific time each day. Some of these (generate a report of the current Azure spend to make sure something hasn't gone haywire, backup SQL Azure databases daily out of Azure as that extra insurance measure) I left on UTC and they actually flip flop an hour twice a year and I don't care. But then there are a handful of people-visible ones, like when an invoice e-mail gets dumped in their inbox or when a file gets transmitted to a dude who sounds a klaxon when he doesn't get it. At Azure scale, there are sure to be other people with a similar range of use cases, which is why I feel properly supporting WEBSITE_TIME_ZONE is valid.

@davidebbo and @mathewc I studied the StackOverflow article, and tweaking the 4-line console app to do some footwork with the TimeZoneInfo class yields the correct result of 167 hours instead of 168:

    static void Main(string[] args)
    {
        // Running on the Friday before DST switch at 6 p.m. (Eastern Standard Time)
        var now = new DateTime(2018, 3, 9, 18, 0, 0, DateTimeKind.Local);

        // I want to run again on the next Friday at 6 p.m. (Eastern Daylight Time)
        var schedule = CrontabSchedule.Parse("0 0 18 * * 5", new CrontabSchedule.ParseOptions() { IncludingSeconds = true });

        // NCrontab coughs up a date
        var next = schedule.GetNextOccurrence(now);

        // Instead of simply subtracting the DateTimes, fiddle with DateTimeOffset
        var tz = TimeZoneInfo.Local; // Id == "Eastern Standard Time", in my case
        var nowOffset = new DateTimeOffset(now, tz.GetUtcOffset(now));
        var nextOffset = new DateTimeOffset(next, tz.GetUtcOffset(next));

        // Unlike (next - now), (nextOffset - nowOffset) will factor the offset change into account
        var sleep = nextOffset - nowOffset;

        // It works!
        Trace.Assert(sleep.TotalHours == 167, "One week is normally 168 hours, but it's 167 hours across DST");
    }

So that definitely looks like it may be a simpler solution than bringing in another library / dependency, you're just relying on the framework. (If the local TimeZone is Utc, then it still prints out 168 hours, which I believe is what we want.)

There is still some weirdness about schedules that would occur actually within the DST spring forward / back hour. Left as is, this code will still compute the right sleep time on the timer, just albeit with next and nextOffset being invalid times. Basically the same way it works now. Though it ends up working with either an invalid or ambiguous time in the local time zone, it ends up spitting out the right result:

    static void Main(string[] args)
    {
        // Running at 1:59 a.m. Eastern Standard Time, i.e. one minute before the DST switchover
        var now = new DateTime(2018, 3, 11, 1, 59, 0, DateTimeKind.Local);

        // I normally want to run on the 59th minute of every hour
        var schedule = CrontabSchedule.Parse("0 59 * * * *", new CrontabSchedule.ParseOptions() { IncludingSeconds = true });

        // NCrontab coughs up a date -- in this case, 2:59 p.m., which doesn't actually exists because we're
        // going to skip to 3
        var next = schedule.GetNextOccurrence(now);

        // Instead of simply subtracting the DateTimes, fiddle with DateTimeOffset
        var tz = TimeZoneInfo.Local; // Id == "Eastern Standard Time", in my case
        var nowOffset = new DateTimeOffset(now, tz.GetUtcOffset(now));
        var nextOffset = new DateTimeOffset(next, tz.GetUtcOffset(next));

        // Unlike (next - now), (nextOffset - nowOffset) will factor the offset change into account
        var sleep = nextOffset - nowOffset;

        // It works!
        Trace.Assert(sleep.TotalHours == 1, "I want to sleep an hour");
    }

I guess we also want to avoid the fall back situation of having functions unnecessarily invoked twice in the 2 o'clock hour. As written above, it still sleeps for an hour, which I reckon is what we want (flip the date to 2017, 11, 05 for the fall back).

I think you could still end up with weird scenarios if the time zone offset changed twice during the sleep period (like, a Function that runs once a year for some reason). But I think it's safe to say someone's app will have restarted in the meantime!

Hope this helps.

Clockwork-Muse commented 6 years ago

tz.GetUtcOffset(now)

.... well, it's embarrassing I missed that, stating so confidently it didn't exist...

As a reminder to anybody who might look into "truing up" the next scheduled time... remember that not all offsets/DST adjustments are in hour increments!

I think you could still end up with weird scenarios if the time zone offset changed twice during the sleep period (like, a Function that runs once a year for some reason). But I think it's safe to say someone's app will have restarted in the meantime!

At that scale, at least one country will have posted a DST change (some do every year). If there's an aim for a closer, more correct, value, then some form of active scheduling (with an ongoing timer that looks through events happening soon) is required.


The pedant in me wants to recommend that the "last run" time in the info object is always in UTC, since it's a logged instant. Make of that what you will.

mathewc commented 6 years ago

Guys - thanks for the deep analysis on this :) @npiasecki I'll do some of my own testing with your suggested fix - looks good though. If it works, I'll try to get the fix in in our next release. Thanks!

ghost commented 6 years ago

Hello is there an update to this?

We have some legacy code running on scheduled functions that requires BST (British Summer Time)

Will settings below in "application settings" work:

WEBSITE_TIME_ZONE=GMT Standard Time

Time zones changes will be kicking in this weekend. Just wanted to make sure the single change will do the job.

EDIT:

BST times came into effect today. Functions fired an hour later than expected.

Ran TZUtil and it shows the right timezone. tzutil /g It correctly shows: GMT Standard Time but functions firing 1hr later.

Will adjust timed functions by one hour until this is fixed.

EDIT 2

I adjusted the timed functions to fire the event 1 hour earlier. This time round it decided to fire it at the correct time! My system ended up sending SMS messages an hour earlier than expected.

The setting: WEBSITE_TIME_ZONE=GMT Standard Time was still present but I wasn't expecting it to actually work.

I will assume that it had issues just on the day of GMT/BST switch over. I will change settings of the Azure function to run at "GMT Standard Time" from now on.

mathewc commented 6 years ago

PR with the suggested fix is out. Please have a look and feel free to comment.

mathewc commented 6 years ago

Merged the underling fix to the TimerTrigger extension. We'll roll it out in the next Azure Functions release.