cloudfoundry / app-autoscaler-release

Automated scaling for apps running on Cloud Foundry
Apache License 2.0
27 stars 52 forks source link

Scheduled autoscaling scales down to default instance_min_count around midnight #3356

Open martinvisser opened 1 week ago

martinvisser commented 1 week ago

Using the following recurring schedule our application scales up at 00:00 on the 19th day of the month. We want to keep running with that same amount of instances until the end of the 27th day of the month. And we want to do that each month.

But what happens is that every night, at around 00:06 we have autoscaling events. We think this problem is due to the 00:00-23:59 start_time/end_time configuration which is round down to the minute. So, basically there is one minute between 23:59 and 0:00 where the autoscaler "thinks" there is no recurring schedule. If not mistaken, 00:06 makes sense due to due cool down/breach settings.

Adding seconds, like 23:59:59 isn't allowed according to the API. There also doesn't seem to be an option to configure it like "from the 19th day 00:00 until the 27th day 23:59.

Any suggestions?

{
    "instance_min_count": 28,
    "instance_max_count": 36,
    "scaling_rules": [
        {
            "metric_type": "throughput",
            "breach_duration_secs": 300,
            "threshold": 60,
            "operator": "<",
            "cool_down_secs": 300,
            "adjustment": "-1"
        },
        {
            "metric_type": "throughput",
            "breach_duration_secs": 60,
            "threshold": 90,
            "operator": ">=",
            "cool_down_secs": 60,
            "adjustment": "+1"
        }
    ],
    "schedules": {
        "timezone": "Europe/Amsterdam",
        "recurring_schedule": [
            {
                "start_time": "00:00",
                "end_time": "23:59",
                "days_of_month": [
                    19,
                    20,
                    21,
                    22,
                    23,
                    24,
                    25,
                    26,
                    27
                ],
                "instance_min_count": 36,
                "instance_max_count": 36
            }
        ]
    }
}
salzmannsusan commented 6 days ago

Hi martinvisser, you've identified an intriguing corner case. To aide the investigation, can you please send us the complete scaling histories of the days 19 and 20. Kind regards, Susanne

martinvisser commented 6 days ago

These are the events from the last days:

2024-11-21T00:00:44.00+0100   audit.app.process.ready   web                                                                       index: 35, cell_id: 2d776afa-ce2e-4f06-bafd-e79a4a7c5a45, instance: 2ef3ab10-ba1f-4d29-6c36-ae7f
2024-11-21T00:00:01.00+0100   audit.app.process.scale                                                                             instances: 36
2024-11-20T23:59:53.00+0100   audit.app.process.scale                                                                             instances: 35
2024-11-20T00:00:45.00+0100   audit.app.process.ready   web                                                                       index: 35, cell_id: 48e574ba-10a7-4d35-8236-95aec1673ac1, instance: ffff8f4d-c933-405c-4cbf-3869
2024-11-20T00:00:01.00+0100   audit.app.process.scale                                                                             instances: 36
2024-11-19T23:59:53.00+0100   audit.app.process.scale                                                                             instances: 35
geigerj0 commented 5 days ago

@martinvisser could you please share the relevant output of cf autoscaling-history that'd be awesome :)

https://github.com/cloudfoundry/app-autoscaler-cli-plugin?tab=readme-ov-file#examples-5

martinvisser commented 5 days ago

This is from this night:

Scaling Type        Status          Instance Changes        Time                            Action                                                          Error
scheduled           succeeded       35->36                  2024-11-22T00:00:00+01:00       +1 instance(s) because limited by min instances 36
dynamic             succeeded       36->35                  2024-11-21T23:59:52+01:00       -1 instance(s) because throughput < 60rps for 300 seconds
geigerj0 commented 5 days ago

Thanks, that is very helpful to ensure that our assumptions are aligned with the observed behaviour on your side.

As said, you identified a very nice corner case, congratulations šŸ‘¼ šŸ‘.

@silvestre any insights that you'd like to share when it comes to the priority of improving here?

martinvisser commented 5 days ago

Haha, thanks, I guess šŸ¤£

silvestre commented 1 day ago

Thank you for taking the time to bring this to our attention.

We truly appreciate your detailed observations and feedback. You are absolutely correct in identifying the missing functionality.

At present, our direct users do not heavily rely on the schedules feature, which means we are unable to prioritize this issue immediately. However, we genuinely value community contributions, and if you or anyone else is interested in addressing this, we warmly welcome any pull requests. Rest assured, we have noted this issue and hope to revisit it when circumstances allow.

Thank you again for your understanding and support!