I ran 600 alarms with cron */1 * * * * on the provider and the thundering herd arised:
All triggers were fired every minute on the minute, so CPU usage of the provider peaked every minutes. I think it could result in harm on management. To deal with the problem, it is necessary to distribute the firings within every minutes.
I suggest adding a few seconds delay before firing.
In my suggestion, two alterations will be needed in part of parameter and scheduler.
In part of parameter, a new paramter strict that takes boolean value is added.
If strict is true, the alarm will be fired at the reserved time sharp (**:**:00). (same as ever)
If false, the alarm will be scheduled with distribution (**:**:00-59).
In part of scheduler, It needs a hash function that takes a name as a key and returns an integer in the range of 0 to 59. When the alarm trigger is created, scheduler injects the value that the hash function returns to newTrigger.cron. For example, if the hash function returns 30 and the cron of the trigger is */10 * * * *, the cron will be converted like this: 30 */10 * * * * . It works because the library node-cron which we used supports sixth field for second.
Using hash function can bring some benefits. Because of the characteristics of hash, we can keep the same delay time among hosts or before and after redeployment of providers without using DB or consensus.
I think the level of peaks decreased if the suggestion is implemented, that leads to keep better condition on the machine running providers. Google uses the similar way to distribute Cron jobs in their systems.
I ran 600 alarms with cron
*/1 * * * *
on the provider and the thundering herd arised:All triggers were fired every minute on the minute, so CPU usage of the provider peaked every minutes. I think it could result in harm on management. To deal with the problem, it is necessary to distribute the firings within every minutes.
I suggest adding a few seconds delay before firing.
In my suggestion, two alterations will be needed in part of parameter and scheduler.
In part of parameter, a new paramter strict that takes boolean value is added.
In part of scheduler, It needs a hash function that takes a name as a key and returns an integer in the range of 0 to 59. When the alarm trigger is created, scheduler injects the value that the hash function returns to
newTrigger.cron
. For example, if the hash function returns 30 and the cron of the trigger is*/10 * * * *
, the cron will be converted like this:30 */10 * * * *
. It works because the librarynode-cron
which we used supports sixth field for second.Using hash function can bring some benefits. Because of the characteristics of hash, we can keep the same delay time among hosts or before and after redeployment of providers without using DB or consensus.
I think the level of peaks decreased if the suggestion is implemented, that leads to keep better condition on the machine running providers. Google uses the similar way to distribute Cron jobs in their systems.