flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.2k stars 551 forks source link

[BUG] Flytescheduler cron expression validation process #4953

Open BroderPeters opened 4 months ago

BroderPeters commented 4 months ago

Describe the bug

Cron expressions like 0 6 * * MON#1 are currently able to being registered, display and executed, but flytescheduler produces the following error logs:

unable to register the schedule {BaseModel:{ID:1259 CreatedAt:2024-02-05 09:51:51.390164 +0000 UTC UpdatedAt:2024-02-05 09:51:51.390164 +0000 UTC DeletedAt:<nil>} SchedulableEntityKey:{Project:<project> Domain:development Name:<my>-launchplan Version:3622cca2ce009e394c1d193d43b6fe46c062afdc} CronExpression:0 6 * * MON#1 FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg: Active:0xc000a042b4} due to failed to parse int from MON#1: strconv.Atoi: parsing "MON#1": invalid syntax

Which according to the flyte docs should also be the expected behavior, as the # symbol is not listed here. Generally, # is used for the nth occurrence of a day of a week in a month.

From a discussion in the community slack

Expected behavior

Either cron expression that don't fit the allowed symbols from the docs are properly validated while registering the launch plan or expressions with symbols like # are also supported in the flytescheduler validation.

Additional context to reproduce

Used Launchplan:

LaunchPlan.get_or_create(
    name="my_launchplan",
    workflow=wf,
    schedule=CronSchedule(
        schedule="0 6 * * MON#1",  # UTC, https://docs.flyte.org/en/latest/concepts/schedules.html#cron-expression-table
    )
)

Screenshots

image image

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 4 months ago

Thank you for opening your first issue here! ๐Ÿ› 

dosubot[bot] commented 4 months ago

๐Ÿค–

Hey @BroderPeters, back again to keep us on our toes, huh? ๐Ÿ˜„ Always a pleasure seeing your name pop up here!

The error you're encountering is likely due to the fact that the cron expression 0 6 * * MON#1 is not supported by the cron parser used in Flyte. The # symbol is not a standard cron expression and is not mentioned in the cron implementation described in the comments of the CronSchedule message in the schedule.proto file.

The cron implementation used by Flyte is based on the standard cron as described by Wikipedia and also supports nonstandard predefined scheduling definitions as described by Amazon CloudWatch, except @reboot.

To resolve this issue, you have two options:

  1. Modify your cron expression to fit the supported format. For example, if you want to schedule a job for the first Monday of every month at 6:00, you could use a cron expression like 0 6 1-7 * MON. This will schedule the job to run at 6:00 on Monday, but only if the date is between the 1st and the 7th, effectively ensuring that it runs on the first Monday of the month.

  2. If you need to use the # symbol or other non-standard cron expressions, you would need to modify the cron parser used by Flyte to support these expressions. This would involve changing the source code of Flyte and may not be feasible depending on your situation.

Please note that any changes to the cron parser would need to be thoroughly tested to ensure they don't introduce any new issues. You can find more information about the cron parser used by Flyte in the schedule.proto file.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a ๐Ÿ‘ if this is helpful and ๐Ÿ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.