cenkalti / dalga

⏰ MySQL backed Job Scheduler with a HTTP interface
https://pkg.go.dev/github.com/cenkalti/dalga/v3
MIT License
247 stars 21 forks source link

ISO8601 Interval Support, Retry Customization #2

Closed tooolbox closed 4 years ago

tooolbox commented 4 years ago

Hey there, was looking over Dalga and I like that it's straightforward, backed by MySQL, and supports a multi-instance deployment. However, it lacks two things for my use case:

  1. Support for ISO8601 intervals rather than pure seconds.
  2. Customizing retries such as the backoff/interval, max limit, etc.

As I said this is a very straightforward project and the source is quite readable. I more or less see what I would need to do to use an ISO8601 interval, and I can make a PR for it. I'm sure retry-customization is not much beyond that.

Before I make any PRs, I wanted to ask: are you open to contributions? Are you interested in having more features in this project?

As a comparison I did quite a bit of work over at Kala which is a similar project, but the way it is structured I don't see how to make it highly-available, which is why I am considering Dalga.

cenkalti commented 4 years ago

Hello @tooolbox,

I am open to contributions however I would like to keep Dalga as simple as possible, not to bloat with features. It's primarily used in put.io and I don't want to add features that I won't use.

Your suggestions seems reasonable. I have couple of questions regarding your feature requests:

  1. Support for ISO8601 intervals rather than pure seconds.

This one seems doable, ISO8601 is a widespread standard and we can add parsing of ISO8601 intervals. Do you need the same feature as in Kala (https://github.com/ajvb/kala#breakdown-of-schedule-string-iso-8601-notation) or do you only need the duration format (https://en.wikipedia.org/wiki/ISO_8601#Durations) to be understandable in interval param when scheduling a job?

  1. Customizing retries such as the backoff/interval, max limit, etc.

This is also doable. Do you need a separate retry policy per job or a single policy defined in config is just enough?

tooolbox commented 4 years ago

Thanks for the reply!

I am open to contributions however I would like to keep Dalga as simple as possible, not to bloat with features.

Great, and that makes sense.

This one seems doable, ISO8601 is a widespread standard and we can add parsing of ISO8601 intervals. Do you need the same feature as in Kala or do you only need the duration format to be understandable in interval param when scheduling a job?

I personally only need the duration format. The unified schedule specification format would be neat, but Dalga already has a way of specifying the first run date so it's not vital.

This is also doable. Do you need a separate retry policy per job or a single policy defined in config is just enough?

Good question.

For my use case, it's sufficient to define a global policy for retry cadence in the config. As a comparison, Google Cloud Scheduler supports per-job retry configuration, but I think for now it's best to have a global setting rather than require a client to include that information every time he creates a job.

cenkalti commented 4 years ago

Sounds good then. Can you a send a separate PRs for those please?

tooolbox commented 4 years ago

Yes, I shall!

I did have something else come up so I am not launching onto this quite yet, but you can expect the PRs to start arriving in a few days, maybe a week or two, depending.

tooolbox commented 4 years ago

Hey @cenkalti I'm glad you liked #9 and #10 and it's great they're merged now!

I'm starting in on the ISO8601 support which hopefully I will finish today, but I wanted to ask about the database aspect.

Specifically, you have interval INT UNSIGNED NOT NULL but for ISO8601 support I will need a text field. (I can't simply translate one of those durations into a seconds interval, since it needs to support months.) Therefore we have a couple of options:

  1. Replace the existing interval field.
  2. Add a second, NULLable field and default to interval if it's missing.

More generally, it raises the question of how you want to approach schema migrations in this project. Since you're creating and dropping tables in Go code, I can simply add/replace the columns in the SQL and it solves my use case, but it doesn't provide an upgrade path for existing users of Dalga. Also, what about the next time the schema needs to change?

Options:

  1. I could move the schema handling out of the Go code and use something like https://github.com/golang-migrate/migrate, or one of the 6 alternatives 😄
  2. I could implement schema version tracking within Dalga itself, either from scratch or using a migration tool as a library.
  3. Every time the schema changes, we could drop an ALTER TABLE sql file into the project and do a major version bump, with a notice that you have to update your schema if you want to upgrade from v10 to v11 and so on.
cenkalti commented 4 years ago

Hello @tooolbox, I am okay with changing interval field in order to add ISO8601 duration support. I don't expect frequent schema changes in future though. We may skip adding a separate migration library or version tracking code for now. To keep things simple, we can just add a SQL file for migrating interval field, bump the major version number and inform users in README file.

tooolbox commented 4 years ago

Great! I made that change, and also added location so that you can control the timezone of the intervals. So if you schedule a job for 3pm in America/Los_Angeles with an interval of P1D, it will go off at 3pm every day even if there is a daylight-savings wall clock shift.

Next I am going to work on retry behavior. Aside from controlling the amount/duration of retries, I need the option to keep retries independent of the overall schedule. For example, if you have a P1W job scheduled for Saturday, and it fails and retries 3 times over the next couple of days, it should still fire on the next Saturday.

I will open a new PR for that, although it will depend on the ISO8601 support.

tooolbox commented 4 years ago

Alright, you have a PR now for idempotency, and another one coming tomorrow for disable/enable. If you don't get to exponential backoff tonight (your day) I will likely do it tomorrow, in which case any notes or preferences from you would be appreciated.

tooolbox commented 4 years ago

@cenkalti looks like I may not get to the exponential backoff for a bit, possibly days or a week or two. If you are interested, I defer to you to implement it.

cenkalti commented 4 years ago

@tooolbox I think I can work on this this weekend.

tooolbox commented 4 years ago

Well it's been a long road (over a month now) but this is all landed. Really awesome @cenkalti, thanks for your help and reviews, and pushing it over the edge.