distribworks / dkron

Dkron - Distributed, fault tolerant job scheduling system https://dkron.io
GNU Lesser General Public License v3.0
4.3k stars 379 forks source link

Taking dkron backup for large setups #1320

Open nikunj-badjatya opened 1 year ago

nikunj-badjatya commented 1 year ago

Is your feature request related to a problem? Please describe. https://gist.github.com/pjz/94f4bd81a0897fd64db44593078e2156 Shows how to take a backup. Our dkron setup has >100K schedules in it. When we execute curl like this, it takes minutes to complete the request and output file is also 10s sometimes 100s of MBs.

Describe the solution you'd like What are other ways to take backup efficiently ? Please advise.

Describe alternatives you've considered Disk snapshot

Additional context None.

nikunj-badjatya commented 1 year ago

cc: @vcastellm , @yvanoers

nikunj-badjatya commented 1 year ago

Any suggestions anyone on this ?

vcastellm commented 1 year ago

Hey @nikunj-badjatya high volume use case here, your scripts looks good:

10s sometimes 100s of MBs

what s means here? are you talking about time or space?

Taking a disk snapshot can be a good alternative in this case. Currently there's not other way of taking a backup from Dkron.

nikunj-badjatya commented 1 year ago

what s means here? are you talking about time or space?

Space. 100's of MBs.

Taking a disk snapshot can be a good alternative in this case. Currently there's not other way of taking a backup from Dkron.

Okay.

  • Have you thought on splitting jobs in several clusters?

We are running single pod statefulset, backed by PVC, deployed in K8S cluster. There are some 150K schedules in it. We haven't thought of splitting into several clusters as of now.

  • What's the source of truth for the jobs? how are they being created?

Jobs are created via API. Source of truth is data stored in MongoDB.