Koed00 / django-q

A multiprocessing distributed task queue for Django
https://django-q.readthedocs.org
MIT License
1.83k stars 289 forks source link

Why not Celery? #8

Open owais opened 9 years ago

owais commented 9 years ago

It would be nice if the docs contained a brief comparison with similar libs or motives behind django-q. Right now, celery is the go to system for most django developers out there. It would be nice to know what django-q intents to do differently.

Cheers.

Koed00 commented 9 years ago

I agree. At the moment however the features are changing almost daily, so I'll wait a little before I commit to the docs. It's barely been a month since I started this. We can use this issue to gather bullet points and discuss/improve/describe them until then.

(I could have probably made this a standalone python project, but I wanted it to be fully integrated with Django. This is key)

Although the worker itself is probably marginally faster, Django Q icreates and manages a cluster of them. Since the cluster already contains a copy of your Django environment, there is no need to create a new env for each task execution.

outime commented 9 years ago

I'm a long-time Celery user and now seriously considering giving Django Q a try, just waiting for the right moment but some comparison documentation would be great indeed! :+1:

Koed00 commented 9 years ago

I'll see what I can come up with this weekend. Meanwhile, are there any comparison questions that spring to mind?

outime commented 9 years ago

I really think your project has a lot of potential and that part of the documentation is essential, so it's great to see that you're willing to write about it!

The points you showed above are the most important IMHO, I'd just give more insight about them. It may be interesting to also get some real benchmarks. Broker support comparison could be useful as well (AFAIK Django Q only supports Redis). A security section would be also great (you already mention that it uses encryption so it could be worth mentioning it in this section).

Those are some quick ideas but probably one can gather more just by looking the Celery docs.

Koed00 commented 9 years ago

The problem I have, is that Celery has amassed such an immense number of features over the years that I'm not sure which one I should be comparing with. This project focuses on integration with Django, so it would be a subset anyway. What I really need is from you, the potential users, is to list the things you love and hate about using Celery with Django. We can then make a better case for its use and add some features. Just keep in mind that I never wrote this as a Celery competitor. I wrote this to make async tasks easier in Django projects.

Django Q's pypi source is 21 Kb, Celery is 1 Mb. I'm sure I'm missing some features.

Redis is used solely for two things:

Nothing more.

Memcached is not very good at the first one. RabbitMQ is pretty slow at this and would add a lot of overhead, plus it's not something most people have in their stack anyway.

This leads me to the security part. Tasks (and statistics) are first serialized with Pickle into a bytestring and then signed with Django's signing module. This basically creates a checksum signature of the pickled task which is then hashed together with the task using your projects secret key. When a worker pulls a task from the queue it first decrypts the package and compares the checksum with the serialized content. So not only is it quite hard to read the data in a package for a potential hacker, also any tampering with the string would be detected.

The disadvantage (or advantage) of this is that the task data on the redis server doesn't make any sense to any other software. It is a closed loop. So my question would be; What advantages would multiple broker support give you, other than the convenience of existing infrastructure?

Koed00 commented 9 years ago

I started a thread on the Django subreddit which will hopefully lead to a more comprehensive comparison.

Koed00 commented 9 years ago
Koed00 commented 9 years ago

Is there someone who wants to do a benchmark comparison? I feel I'm not knowledgeable enough about Celery to do it justice in a benchmark. Also I might not be perceived as impartial.

I've been doing my own performance tests with the Parzen Async example code, but I have no idea how to replicate this in Celery.

Another test I often run is:

def countdown(n):
    while n > 0:
        n -= 1

def get_username(user):
    return user.username

def qtest():
    u = User.objects.first()
    for i in range(500):
        async(countdown, 10000 * i, save=False)
        async(get_username, u, save=False)

This one is simple enough and puts a nice bit of strain on the workers, broker and Django backend.

Koed00 commented 9 years ago

So now we have 5 dedicated broker types, plus support for several database brokers via Django ORM. Not via AMQP simulation, but direct dedicated support.

Another difference I spotted is Django Q's ability to execute any python or third party library directly without decorators or pre-loading. This makes it very easy to execute shell commands for example

outime commented 9 years ago

IMHO this issue can be closed now that we have a good comparison that can be included somewhere more visible, which will definitely help a lot to decide what to use for newcomers (myself included). Great work!

DataGreed commented 8 years ago

@Koed00 hi! Thanks for the great lib!

Is django-q production-ready?

Eagllus commented 8 years ago

@DataGreed currently I'm using django-q in production for 2 projects.

I haven't found any problems so far, except that I have had to add the job manual in the backend. This happens because the add option from django-q currently only adds and doesn't check if the current task is already in the database

Koed00 commented 8 years ago

@DataGreed

I've personally been using it in several commercial projects over the last 6 months or so. One of them has users in the tens of thousands and is used to send emails, live Haystack indexing, cache invalidation and handle cascading model signals. So far I've encountered very few problems. I recently added Rollbar support which directly reports any problems with tasks from any of our servers to my Rollbar account, this has helped a lot to track down and fix any problems we've had quickly. Is it production ready? I don't know. It's stable enough, but I'd love to add and expand the features before I take it out of beta status.

Koed00 commented 8 years ago

Another big difference with Celery that's become more obvious lately, has been AMQP's need for workers acks to be in process. With that I mean that pulling a job and acking it, has to happen in the same connection for AMQP otherwise the job is considered not acked and will be available for the next worker. This stems from AMQP's legacy as a banking protocol. Django Q's design takes a very different approach. The pulling, executing and acking is asynchronously done by individual processes, separated by multiprocessing memory queues. This gives you much more flexibility when dealing with long running processes or processes that rely on outside services to complete, without tying up your broker.

Eagllus commented 8 years ago

Something I like very much about Django Q is that I can use the Django database as a broker. This is 'available' in Celery but is outdated and has many known bugs.

A reason why I like this is security. I have a application where I use certificates to encrypt data. Because this process is slow I use a background task but Celery either requires me to run a extra process (redis) on the server to handle the queue or I have to work with a buggy broker.

grayb commented 8 years ago

We are using django-q in production at the Indianapolis Museum of Art to process daily changes to our online collection imagery and metadata from several outside systems. We have been very satisfied with it.

We could have used Celery, but as a small dev team we see much appeal in small stacks, easy setup, and few dependencies. We process about 500k tasks per day, and it's nothing that python and a common database (postgres) can't handle.

Koed00 commented 8 years ago

So cool to hear people actually using it besides myself :)

frnhr commented 8 years ago

Main selling point for me was django-admin integration. Plus the ORM broker, but this is secondary (for my current project, anyway). Celery has deprecated its admin integration in favor of standalone "flower" interface, which is all nice and good, but isn't what I need at the moment.

spapas commented 8 years ago

Hello, one more interesting question (at least for me it's clear why celery is bloated for small projects) is Why not django-rq?

From what I see in the docs, django-q by default uses the redis broker, so, if I don't want to use a different broker why should I choose django-q instead of the combination of rq and django-rq?

Thanks !

Eagllus commented 6 years ago

@spapas after almost 2 years (yeah very slow response) from what I can see django-rq is designed to use Redis as a queue and then processes everything in order they are queued. If you would only need that feature I think django-rq can serve your needs.

spapas commented 6 years ago

@Eagllus better late than never :)

I am using these features, yes but I'd really like to know which are the extra features that django-q offers? They may be useful for some projects!

dufferzafar commented 4 years ago

I'm a first time user of task-queues and am trying to consider whether I should be using Django Q or Celery or something else. Any links to discussion will be appreciated.

shawnngtq commented 4 years ago

👍 bump.

As of today, the available discussions on the web seems to be 3-4 years old. Will be great to have a comparison between:

raratiru commented 4 years ago

There is also a lightweight newcomer: django-simple-task.

mohmyo commented 4 years ago

As @shawnngtq pointed out, an updated discussion would be great for newcomers like me. I know what I want to do but can't decide what is best for me to go for it without a decent comparison, it will be helpful for sure.

kishorpawar commented 4 years ago

Ditching celery for following.

  1. In built worker management
  2. ORM as broker
  3. Admin Interface

Using django-q for sending scheduled emails.

srggrs commented 3 years ago

tbh both Django-q and celery are great, I decided to go with DjangoQ because of simplicity and easy integration with Django. Celery requires to spin up a RabbitMQ for message handling, and I do not want to add extra complex things to manage as part of my stack...

Stephane-Ag commented 1 year ago

I am really interested in this package since I don't want to have to use Redis with "django-rq" but this repo hasn't received any updates in over 2 years and is still missing a basic list comparing the benefits of it vs. the alternatives.

Just found there's a maintained fork of this repo!