FinalsClub / karmaworld

KarmaNotes.org v3.0
GNU Affero General Public License v3.0
7 stars 6 forks source link

skip redundant celery queue tasks when appropriate #438

Open btbonval opened 9 years ago

btbonval commented 9 years ago

There are occasions when tasks do not run for long periods of time as a matter of course. This is typical in a dev environment, but is a constant feature of our staging system.

Certain tasks, like fix_note_counts are set to run every 24 hours to update the cache. However, running it 32 times because it has been 32 days since the worker ran is not beneficial. Be it 32 days or just 1, running fix_note_counts one time will bring the data to completion.

Certain other tasks, like tweets about a new note, are distinct and should be run.

Is there any way to create classes of tasks that queue in certain ways? If so, this should be implemented. Any update tasks only need to get queued one time; any more is wasteful.

btbonval commented 9 years ago

This sort of feature would need to be supported in celery beat somehow. Maybe there's a singleton schedule method or task quota or something.

These two tasks serve no purpose being queued in multiple: https://github.com/FinalsClub/karmaworld/blob/321c9fd8be8eb3776ee56f628d2918d689ac4e2c/karmaworld/settings/prod.py#L92-L99

btbonval commented 9 years ago

Periodic task fields. Options are anything supported by apply_async().

apply_async() supports an expiry time. We could set the expiration to 1 days (24 hours), which would only allow between 1 and 2 instances of any particular daily update task to remain in the queue.

expires must be "as seconds after task publish" or a timestamp. Timestamp is not feasible because expires is set just one time at server load.

Something like this for 24 hour expiration after the task is published:

'update-scoreboard': {
  'task': 'fix_note_counts',
  'schedule': timedelta(days=1),
  'options': {'expires': 86400},
},
btbonval commented 9 years ago

This ticket is an example of how much can be done while waiting for the staging system queue to complete.

still waiting...

btbonval commented 9 years ago

... and since this is a second Heroku worker off the side of the main web worker, we're being charged for every minute or hour it runs. So we're wasting money recalculating these update statistics repeatedly and without any merit, since the first run and last run and every run in between will yield the same basic results for update tasks.

btbonval commented 9 years ago

Applied expiration to all 3 periodic tasks, since none have any reason to build up a backlog. The code was put in a branch and pushed to beta for testing at the time of this issue comment.

Check back in a few days or a week and see how much has accumulated in the queue backlog. It should just be 3 tasks: one of each.