Crontap / crontap

Schedule recurring API calls without the hassle
https://crontap.com
11 stars 0 forks source link

Bug: Something weird with the schedules is going on #12

Open francistogram opened 8 months ago

francistogram commented 8 months ago

Not sure if this is related to the last issue or not but something weird is going on this morning for schedule 8qhW7k67PONn0IQGzzeE which is set to run every 5 minutes

6:50am CST works fine

image

6:55am CST works fine (failure is something on my side)

image

Then it skips 7am and runs at 7:04:56am so likely the job was delayed

image

Runs at 7:05am

image

Then something weird happens again

Runs at 7:10:01 which is correct

image

But also runs at 7:10:56 which doesn't make any sense

image

Any idea what's going on here @danmindru?

Not sure if related to the issue from last week https://github.com/Crontap/crontap/issues/9

francistogram commented 8 months ago

My guess is that it's not related to the last error and my suspicion is that it's actually related to the timeout given that some other jobs scheduled to run every 5 minutes e.g.

Did not have any issues

francistogram commented 8 months ago

Seems like this execution at 6:55am CST was running for 943s or 15m and 43s

image

I checked the vercel logs for the endpoint and don't see anything on my side and the other interesting thing is that I have my timeout for these serverless endpoints to be 4 minutes and the max is 5 minutes that I'm not sure how it could've hung for 15m 43s

image
danmindru commented 8 months ago

Thank you for the detailed report! Investigating the issue and will get back to you asap.

danmindru commented 8 months ago

Hi @francistogram. Looking closer at the logs, it seems like the timeout indeed caused the issues here.

What is important to note is this was not a timeout on your function, but a network timeout. The destination URL could not be reached at all (hard to know the reason, can be DNS, CDN or maybe something simpler like a cold start or deployment). This would explain also why it's not visible in the Vercel logs.

Either way, Crontap waits for a response for up to 1h, which means a long-running request could potentially overlap with future schedules. At the moment

From our logs it seems like the job at 12:55:00 ran until 13:10:54. Without going much into detail, there is some investigation work attached below.

A potential solution here is to optionally allow customizing the maximum wait for each schedule. Here a ~5min wait before the request is abandoned should have prevented issues. This could also be set automatically based on the schedule interval, but I wonder if setting that automatically would cause other problems on it's own.

Either way, allowing this to be set manually should be a sensible option, albeit advanced.

Screenshot
francistogram commented 8 months ago

The maximum wait time sounds like a reasonable solution to me!

The destination URL could not be reached at all (hard to know the reason, can be DNS, CDN or maybe something simpler like a cold start or deployment)

I'll reach out to Vercel and see if they have any more context on this

Thanks for digging into this issue 🙏