Open showwin opened 2 years ago
I tried to start running periodic jobs without result_ttl
parameter in my Redash, so I will be able to report if the same issue happens after a month or so.
In [31]: for job in rq_scheduler.get_jobs():
...: print(job, job.result_ttl)
...:
<Job f62bec30c67e00a7fc03337072b74227ac70c24b: redash.tasks.queries.maintenance.refresh_queries()> -1
<Job 27ccf7679d55fd368fa2a1a6262864c92ef411cf: redash.tasks.queries.maintenance.remove_ghost_locks()> -1
<Job b782f61249584a20ce2ce4e4c7fe13f09ca541ab: redash.tasks.general.sync_user_details()> -1
<Job 7f43bfe21b320bd6b3708d360909ebcfc2cd11c2: redash.tasks.queries.maintenance.cleanup_query_results()> -1
<Job 10d88ccbd46893f33fe792c7feb41534244a0b09: redash.tasks.queries.maintenance.refresh_schemas()> -1
<Job 5ae7d296b02dd520401aa5983db5b36b62828c6f: redash.tasks.failure_report.send_aggregated_errors()> -1
<Job 6d9d0f6047c92bb47d1a9895003f7c82f96533f0: redash.tasks.queries.maintenance.empty_schedules()> -1
<Job 75692428f53afeb43a549ad66fc3610c25f4d467: redash.tasks.general.version_check()> -1
How about removing result_ttl parameter or set result_ttl=-1 explicitly?
Almost a month has passed since I made the above change to my Redash, and I have had no problems 👍
I was having a similar issue and removing result_ttl
parameter has fixed it. So like modifying this part of the source code:
https://github.com/getredash/redash/blob/49277d27f8a8b17f541948b741539a612bfacc00/redash/tasks/schedule.py#L42-L50
Issue Summary
The issue from the user's perspective is that scheduled jobs suddenly stop working. Scheduled jobs were running correctly until that time but suddenly stopped working.
From the technical perspective, some periodic jobs which are stored in Redis are removed for some reason that's why scheduled jobs stop working. I'm going to write it in detail below.
My Redash is running on a Kubernetes cluster.
Steps to Reproduce
This problem occurs irregularly. In my case, it happens every 10-30 days. Therefore it's difficult to reproduce intentionally. This happened to me nine times in total.
Technical details:
Redash Version: Using Docker the image
redash/redash:10.1.0.b50633
How did you install Redash: Using Kubernetes to run containersThe expected periodic jobs which should be stored in Redis are like this:
At least these six jobs should be stored, but when this issue occurs, the result of
rq_scheduler.get_jobs()
was:Some jobs were removed from rq_scheduler.
As far as I investigated, the root cause seems to be
result_ttl
parameter that is defined around here. As another person reported here, rq_scheduler doesn't recommend to useresult_ttl
parameter for a repeated job in its REAMDE.The
result_ttl
was added to Redash codebase from the very beginning when Redash replace Celery with RQ (ref), and the longerresult_ttl=600
was introduced by this PR to extend the default value. So I couldn't find strong reason why Redash usesresult_ttl
parameter.How about removing
result_ttl
parameter or setresult_ttl=-1
explicitly? If it sounds good, I'll create a PR with that fix.P.S. I also searched the code which deletes a job from rq_scheduler, but the code is located only in the initializing process.