MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

background tasks: explore celery or gearman #181

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

Django-Background-Task has been working great until now, but might have hit limits.

1) Hash for running / completed tasks are not unique; they are hashses of task.name and task.params. This means that an identical function / params call will have the same hash in DB. ID changes as well, so there is nothing that can link.

2) Tasks run sequentially. When background tasks were exclusively for deleting Jobs, this was not necessarily a bad thing, as it kept the pressure off MySQL. But if background tasks will be used for report generation, perhaps it's not ideal.

Have used celery in the past for other work, and while there is some overhead for setting up, it's powerful and well documented. New to Gearman, but see that Archivematica uses and might be worth exploring.

ghukill commented 6 years ago

Alternatively, a quick hack will allow the continued use of background tasks for time being, ensuring all task hashes are unique - pass a unique string to each background task function call (in this case goober)

In [8]: bgt = test_bg_task(duration=5, unique_hash=uuid.uuid4().urn)

In [9]: bgt.task_hash
Out[9]: '452dee1c7ffc475b0e80a64b6471eb7fe13338a3'

In [10]: bgt = test_bg_task(duration=5, unique_hash=uuid.uuid4().urn)

In [11]: bgt.task_hash
Out[11]: 'd3089069c75454a0154c844c5420f457ddbbb3b7'

While this works, it's not ideal.

ghukill commented 6 years ago

After perusing docs a bit more, other options abound.

1) Can ascribe a verbose_name to a task, which can become a handy unique identifier across running/queued/completed tasks

2) Also, can affix another model instance to Django background task. This is enticing, as it would remove the need to do so the other way (Combine task --> Django BG task), but there is cost that way as well:

FieldError: Field 'creator' does not generate an automatic reverse relation and therefore cannot be used for reverse querying. If it is a GenericForeignKey, consider adding a GenericRelation.

The creator field does not have an automatic reverse relation, so cannot query tasks wherer creator == CombineBackgroundTask instance. Thinking more, it would be most helpful to paint a table of Combine tasks, which have been tethered with a hash to the Django tasks. Would fail more gracefully in instances where a Combine task was attempted, but no background task was created; this would indicate to user that it was tried, but for some reason, was never initiated. But the Combine task should be prioritized.

ghukill commented 6 years ago

Made sufficient progress with Django Background Tasks to continue using. Like that they are natively tied to Django ORM. And, they can run concurrently, so there are options there. Closing this for now.