eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
862 stars 784 forks source link

Cron Scheduler Replacement? #4133

Open kaikreuzer opened 7 years ago

kaikreuzer commented 7 years ago

We have learned over the past months that implementing a cron scheduler is not a trivial task - our current implementation has caused quite some issues, which are hard to analyse, debug and fix.

Instead of fully inventing it on our own, we should check whether using existing implementations might be a viable option. Two potential candidates have already been mentioned in the past and I would like to at least check their pros & cons to come to an informed decision:

maggu2810 commented 7 years ago

FYI: The same github person that hosts the cron-utils project also hosts an cron-utils-scheduler project. Perhaps we could have a look at this one, too.

kgoderis commented 7 years ago

Cons : it only supports cron expressions

I initially implemented the scheduler as part of a larger exercise to have a built in caldav server, but that was never reviewed, and to replace the quartz scheduler. I would not like to throw that effort away

This aside, imho this kind of effort is energy badly spent. What kind of issues did we have, and that we are unable to solve? Does the 2-3 bugs we had to fix really make that we have throw all this away? I don it understand your reasoning here

kaikreuzer commented 7 years ago

Cons : it only supports cron expressions

Right, that's a good point - caldav support can indeed become an important feature.

What kind of issues did we have, and that we are unable to solve?

We had a couple of issue that haven't been solved: https://github.com/eclipse/smarthome/issues/3185, https://github.com/eclipse/smarthome/issues/3912 and https://github.com/eclipse/smarthome/issues/4145 and a few others that were solved.

The problem is that all of them are rather critical issues and quite some people actually already tried looking into them, but failed to come up with a fix. I have to admit that I personally struggle to understand the code and its inner working - so I am of no huge help here either.

triller-telekom commented 7 years ago

I just tried to fix #3185 with PR #4154. It took me a couple of days to understand what is going on in the scheduler and I am still not sure that I understood everything. I have to say that the code is pretty hard to read because:

@kgoderis Could you please have a look at my PR to see if I got it right? The thing I am trying to do there is to detect time jumps larger than 3 seconds by storing the time the monitorTask plans to wake up and then comparing it to the time it actually woke up. In case of the daylight saving time change there will be a 1 hour gap which will then lead to a rescheduling of all scheduled tasks.

In order do do that I had to change the behavior of the monitorTask to wake up regularly and perform this check. I also introduced the DateWrapper which offers me the the possibility to fake a time jump for testing purposes, see my 2 new test cases.

Is there someone else familiar with the scheduler code who could maintain it, except for you?

kgoderis commented 7 years ago

@triller-telekom @kaikreuzer

I do acknowledge that the code of the scheduler is quite hard, and it became quite hard when I tried to combine cron expressions with RFC5545 expressions. These are quite different animals, for example, in RFC5545 one can specify an end date, and I tried to come up with an algorithm that would be able to calculate both expressions. If you would only consider cron, then it would be far more simple. From that aspect I do agree with Kai's initial concern

In short, the algorithm does basically this: take one or more seeds (e.g. the start date), and expand those seeds by applying parts of the given expression so that the set of candidate dates grows. In order to keep the number of candidates reasonable - as it explodes very rapidly, e.g. expressions with * specified for the "second" expression part - the set of candidate dates is trimmed/pruning to remove the candidates that would not "make it", e.g. that are unlikely to be triggered next. The generation of candidate dates is further optimised by giving the expression parts a specific order (e.g. in case of cron, we first expand the "year" part, and the "second" part last)

Now, some early bugs in the code where jointly fixed by @maggu2810 and myself, so he is the only one that also up to speed with the code, but even for me I have to revisit it every time since I wrote that such a long time ago.

https://github.com/eclipse/smarthome/issues/3912 (After debugging a little bit I found out, that CronExpression#applyExpressionParts is not able to calculate dates, that would require a change to the next year, it fails e.g. for 0 0 0 ? ' and 2017 Dec 31, 23:01.) -> I think this one has to do with the pruning part of the algorithm that is too aggressive. It can be solved by keeping for the "year" part of the expression an additional candidate. If I remember well (long time ago!), I limited execution to the current year.

https://github.com/eclipse/smarthome/issues/4145 -> Could be linked to fact that none of the expressions types have a granularity at the millisecond level (e.g. smallest is a 1 second), and therefore the algorithm has to round some of the Date(...) outputs. If rounded downwards, this might lead to side effects for special cases like midnight

With respect to the specific comments/questions you make:

So, lengthy post, but that means @kaikreuzer that I am back in business/shape.

K

FrankR59 commented 7 years ago

take one or more seeds (e.g. the start date), and expand those seeds by applying parts of the given expression so that the set of candidate dates grow

Regarding this: could #4130 be caused by some kind of overflow or memory leak?

kaikreuzer commented 7 years ago

@FrankR59 Clearly not, see my https://github.com/eclipse/smarthome/issues/4130#issuecomment-325382966.

kgoderis commented 7 years ago

Regarding this: could #4130 be caused by some kind of overflow or memory leak?

That issue is not related to this one, see https://github.com/eclipse/smarthome/issues/4130#issuecomment-325382966

That being said, AFAIK there never have been memory leak problems with Quartz.

FrankR59 commented 5 years ago

Anything new about this issue? I am using rpi3/stretch with ssd instead of sd-card and oh2.3.0 meanwhile, but time based cron rules still are stopping every few days!

jewesta commented 5 years ago

I wanted to reference this issue: https://community.openhab.org/t/cron-didnt-fire-summer-time-issue/42477 because I think it might be related. Contrary to the title of the discussion I now believe that the cron fires. It probably just fires at the wrong (old) time. Since the calculations in the rule body probably use the correct (new) time, the value of the item didn't change.