FirebaseExtended / firebase-queue

MIT License
786 stars 108 forks source link

Scaling horizontally with more servers => Slow down to pull jobs off the queue #47

Closed dylanjha closed 8 years ago

dylanjha commented 8 years ago

When increasing the number of servers processing jobs, it feels like I'm seeing a dramatic slow down in the workers pulling jobs off the queue when there's a lot of jobs (~700) queued at the same time.

Result Seeing a dramatic slow down from when jobs get added to the queue, to the time that jobs get pulled off.

My best guess here is that an individual server is not able to pull down the job until the other servers (which are all trying to read and write it at the same time) know about it to resolve which worker claims it.

cbraynor commented 8 years ago

Unfortunately you're reaching the limitation of transactions on a single location in a Firebase Database. The more workers there are, the more of them that are attempting to perform a transaction on the database. The Firebase client SDK uses optimistic concurrency to write, so if the location is highly contested things slow down and may even start failing.

One suggestion I have for you is to shard your queue to multiple locations in your database, and then have the clients choose one at random (or with some other mapping). You could even keep the list of available queues as a dynamic property in your database and then as you add more you just update that so your clients are aware of which ones it can use.

dylanjha commented 8 years ago

Thanks for your reply @drtriumph, this makes sense. I wonder if a sharded queue pattern could be built into this library. Any interest?

dylanjha commented 8 years ago

I just ran some tests and it looks like I was able to optimize this and actually get it to work at a pretty good scale by increasing numWorkers to as much as possible without blowing out the memory on the server and decreasing the number of servers.

Can you please confirm that this theory makes sense?

cbraynor commented 8 years ago

Ah, that makes sense - the reason numWorkers is working in this case is that all those workers are sharing one instance of the client SDK. That means both that the requests are being pipelined so there's less contention, and that they're sharing the same cache so the optimistic writes have a better chance of succeeding.

As for building it into the library - it's not something we've seen a huge demand for so far, and I'm not very keen on the idea of complicating the logic on the adding to the queue side in the general case. If it's something that comes up more then I'd be happy to reconsider that stance.

dylanjha commented 8 years ago

Thanks for confirming. :+1:

meticoeus commented 8 years ago

Rather than building the sharding logic into the basic queue, what about adding a separate Queue class at some point?

This would probably warrant a companion library(ies?) that could be used on the client side that would automatically negotiate which queue to add to. If the interfaces can be kept identical, user could switch implementations as their needs grow with minimal hassle.

correasebastian commented 8 years ago

hi @dylanjha , @drtriumph , i cant make it work the firebase example when resolving async, (resolve sync works)

`var Queue = require('firebase-queue'), Firebase = require('firebase');

var ref = new Firebase('https://.firebaseio.com/queue'); var queue = new Queue(ref, function(data, progress, resolve, reject) { // Read and process task data console.log(data);

// Do some work progress(50);

// Finish the task asynchronously setTimeout(function() { resolve(); }, 1000); });`

could you help me, may be with your enviroment configuration, what node version , firebase version and firebase-queue version are you using?

are you resolving with async functions? any advice will help me a lot

thanks

meticoeus commented 8 years ago

I am primarily using es6 Promises to wrap handler code

I've tested with Node v5.8-5.10 I've had no issues resolving in promise callbacks. Some tasks are long running (>20 minutes) and no issues.

package.json

"firebase-queue": "^1.3.x"

A queue handler snippet. service.perform is a method that returns a promise:

queueHandler(task, progress, resolve, reject) {
  if (task) {
    service.perform(task).then(resolve, reject);
  } else {
    reject('Invalid Task');
  }
}

Firebase is being allowed to resolve automatically, currently running the latest version: 2.4.2.

Can you give more information about your configuration like what node version you are using?

dylanjha commented 8 years ago

@correasebastian I am resolving tasks asynchronously in almost every case.

"firebase": "2.4.1",
"firebase-queue": "1.2.1",

I'm not using the progress function, but by the looks of your example it looks like it should work just fine. I would double-check and dig deeper.

One thing to check: be aware that if there is a synchronous error, it will fail silently and get set as the "error" property to your task on the queue. So you might have an error in your code before you call resolve and the task is failing into an error state

correasebastian commented 8 years ago

thanks @dylanjha and @meticoeus for taking the time to answer my question, apprerciatte that.

i created a github repo, may be if you have time you can try to see whats going on: https://github.com/correasebastian/testingqueue/

i have noticed there aare many people getting this same error, the queue is looping nad looping and never get resolved

thanks