FirebaseExtended / firebase-queue

MIT License
786 stars 108 forks source link

Inability of worker to lock/claim task --- infinite loop #48

Open ekaram opened 8 years ago

ekaram commented 8 years ago

We've been using firebase-queue for a while now. We saw some odd behavior in production last night and haven't been able to reproduce locally in development.

Our queue workers appeared to be in infinite loop of processing same queue task over and over again. I watched on the firebase dash as the same task turned yellow (claimed), and then green again (as if recreated from scratch) repeatedly.

I was able to resolve the issue by clearing the tasks from the queue and doing multiple server restarts of the code running node with the queue workers.

I watched the problem occur on two separate queues. The queue worker code for those queues are different, separate and has been stable.

If you have any ideas for what to look into or how to reproduce, please let me know

peranderson commented 8 years ago

I just started using Firebase queue yesterday with the latest code. The example posted where they use a setTimeout then call resolve() does not work. If you call resolve outside of the timeout, it removes the task from the queue. If you call resolve() inside of the timeout or any other async callback, it does not remove the task from the queue as it should and your worker will pick up that task again over and over. Essentially looping over the same task. This issue needs some attention or a workaround ASAP.

peranderson commented 8 years ago

This guy is running into the same issue. Nobody has responded to his question. I've searched all over for a solution. If he moves the call to resolve outside of the setTimeout the example will work. http://stackoverflow.com/questions/35750115/firebase-queue-triggering-several-times-i-dont-know-why

cbraynor commented 8 years ago

Do you have a reproducible case? If so, that would help debugging immensely - I wasn't able to reproduce the error with the linked SO example on node 4.3.1

correasebastian commented 8 years ago

same problem here, i was trying change my nodejs version, and nothing works, any async operation inside the queue main function is not resolving, not even you example

`var Queue = require('firebase-queue'), Firebase = require('firebase');

var ref = new Firebase('https://testingqueue.firebaseio.com/queue'); var queue = new Queue(ref, function(data, progress, resolve, reject) { // Read and process task data console.log(data);

// Do some work
progress(50);

// Finish the task asynchronously is not working
setTimeout(function() {
    resolve('ok');
}, 5000);

// using sync operations work fine, but no async
// resolve();

})`

this prorblem is beingn aorund for a while and we havent received any answer from firebase, im quite disappointed, because firebase its such an amazing tool

my enviroment

windos server 2012 r2 nodejs version : 4.2.1 and 5.10.1

package.json

{ "name": "testingqueue", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "start": "nodemon index.js", "test": "echo \"Error: no test specified\" && exit 1" }, "keywords": [], "author": "", "license": "ISC", "dependencies": { "firebase": "2.4.2", "firebase-queue": "1.3.0" } }

tsemerad commented 8 years ago

@correasebastian Do you have ".indexOn": "_state" specified in your security rules? If so, try removing it. This sounds like a sort of regression of #43. My queues were behaving erratically when the state was indexed, and removing the index fixed it. Granted, my queues weren't repeatedly processing the same task, but were just hung up. Even though #43 was apparently fixed, I've left the indexes off my queues still for now.

CookieCookson commented 8 years ago

I'm having the same issue here, I am sometimes getting the same queue task processed multiple times over and then on occasions it finishes and passes along to the next queue.

ekaram commented 8 years ago

Unfortunately, I do not have timeouts set in my code, so that issue is not the root cause of what I have seen. I have only seen this issue occur one additional time in production, but when it does it has extremely severe consequences for us.

I do index on _state, and have had no hanging issues, so I have not had a reason to remove that.

jclalala commented 8 years ago

The issue happens to me as well. I have two machines running the same node version (v4.4.7) where one machine does NOT have this problem while the other can reproduce with ease.

The problem is that a single worker (I've setup my queue to run with just 1 worker) repeated triggers the same 'one' task while the job does nothing but the below:

var jobInstance = 1; ... ... function (data, progress, resolve, reject) { console.log("started;" + jobInstance); resolve(); console.log("ended;" + jobInstance); jobInstance++; };

Output for one single job: started;1 ended;1 started;2 ended;2

I can provide full server / code access to anyone who's interested to tackle this.

jclalala commented 8 years ago

Ok checked firebase-queue src. So the problem appears to be Firebase (I'm on v2.4.2)... For reference, I'm using firebase-queue v1.3.1.

The problem happens in function QueueWorker.prototype._tryToProcess():398, where the function attempts to open a Firebase transaction on the task. In the updateFunction the code tries update its _state to inProgress. The onComplete callback of the transaction then invoked (line 463) BUT at this point the snapshot is NOT updated.

Although Firebase documentation (https://www.firebase.com/docs/web/api/firebase/transaction.html) indicates that the onComplete callback will have committed and the snapshot params of onComplete reflect what's been updated in the updateFunction, but in this case, occassionally, committed is true yet the snapshot is NOT updated.

I'll check to see if Firebase v3 with firebase-queue 1.4.1 will have this problem solved.

CookieCookson commented 8 years ago

@jclalala Thanks for taking the time to look into this issue, it's been bothering me for ages! I have migrated to Firebase v3 and seem to be still having the problem, can't remember if I upgraded firebase queue from 1.3.1 to 1.4.1 though.

gvkhna commented 8 years ago

+1

jclalala commented 8 years ago

I forked the project (off firebase-queue v1.4.1 and Firebase v3.0.1) and implemented a small workaround. You may want to try it out here:

https://github.com/jclalala/firebase-queue

The workaround is based off a fact that I observed where only _state_changed seems to update (and not _owner, etc...). So when such condition happens we'd delete the item from the queue anyways whenever the processor's resolve() is called.

When the above happens, my forked mod will still remove the task item but if you enabled winston logs you will see an extra 'reset' debug log.

It seems that the code base heavily relies on on cross "transaction" dependencies. Meaning, if transaction A and transaction B happens chronologically, transaction B should reflect updates in transaction A... This assumption is unreliable in my tests. I suggest the authors to review this more in depth. For now the workaround works for me, not sure if there'd be other side effects yet but I'll keep this thread posted if I see any.

gvkhna commented 8 years ago

Firebase v3.2.0 released just yesterday seems to have fixed the issue, it hasn't occurred since but more testing is probably needed to confirm.

jclalala commented 8 years ago

Problem still happens to me on latest version of firebase (3.2.0).

I just tried on the following versions: "firebase": "^3.2.0", "firebase-queue": "^1.4.0",

I'll still revert to my workaround :(

maxtechera commented 7 years ago

Im having the same issue with versions:

Its working fine if I run it locally, but in the server the worker keeps running over and over.

Anyone had some luck with a fix?

--Edit

I checked firebase console and this is happening:

image

donbarthel commented 7 years ago

+1 This started happening to me today on my server. If I run the code (with node, firebase, firebase-queue) off my laptop instead, pointing to the same database, I'm not able to reproduce the problem.

I'm using: "firebase": "^3.0.1", "firebase-queue": "^1.4.0",

donbarthel commented 7 years ago

More on this: jclalala fixed the issue (reported above) for himself changing this line in queue_worker.js from:

var expires = Math.max(0, startTime - now + self.taskTimeout);

to:

var expires = Math.max(10000, startTime - now + self.taskTimeout);

I have previously reported an issue (#45) which I solved by changing that same line to:

var expires = self.taskTimeout;

My change was implemented in a prior project but not this one and until today issue #45 didn't appear in this project. I just now reimplemented my change into my new project and, lo, this issue on my server has now disappeared! Not sure if mine (and jclalala's) changes actually fix the issue or if they just shuffle the code around sufficiently to avoid the issue.

Hope this info helps!

Tyris commented 7 years ago

Same issue as @crazymunky and @donbarthel.

Works fine on my dev machine (mac) but breaks on my Windows Server 2012 machine. That 10000 workaround seems to solve it.

We're using this for our mailer... so when it breaks we suddenly start spamming emails which isn't great.

edit It would appear the issues I was having were caused by our server time being about 2 minutes ahead of actual time (eg: firebase server time). If the server time doesn't match firebase (the closer the better) than this can reasonably happen.

cbraynor commented 7 years ago

Release 1.6.1 contains a bugfix that could be related to this issue. Can you let me know if you're still having issues after updating

calebhailey commented 7 years ago

Not sure if this will help anyone else here, but I was seeing this exact behavior and determined the cause of the issue to be that my test code was taking longer (via a 10-second setTimeout()) than the timeout I defined for my spec, so the task was just falling into a retry loop.

peepo3663 commented 7 years ago

I still have issue that tasks are not resolve and exist in my tasks "firebase": "^3.6.3" "firebase-queue": "^1.6.1"

snowadamc commented 7 years ago

Hey guys, I came along this conversation as I've recently been having this issue. I have been working with firebase queue for a while and was working on some big changes to our backend queue on a test database. When trying to roll this out for our production database today, I saw behavior much like this. After some experimenting, I found that it had to do with my database rules.

In one of my queue functions, I was attempting to read a portion of the database that I had forgot to change permissions to allow my backend worker to access. Instead of giving an error (or counting against the number of retries until the queue just moved on), it kept resetting the owner as others seemed to identify. Though I was eventually able to fix it, this seems like a database read fail shouldn't reset the queue, but cause it to fail and move on.

WikipediaBrown commented 7 years ago

Are you using Firebase-queue in an AppEngine instance or a managed VM/container situation? -Perris

On Apr 17, 2017, at 3:13 PM, snowadamc notifications@github.com wrote:

Hey guys, I came along this conversation as I've recently been having this issue. I have been working with firebase queue for a while and was working on some big changes to our backend queue on a test database. When trying to roll this out for our production database today, I saw behavior much like this. After some experimenting, I found that it had to do with my database rules.

In one of my queue functions, I was attempting to read a portion of the database that I had forgot to change permissions to allow my backend worker to access. Instead of giving an error (or counting against the number of retries until the queue just moved on), it kept resetting the owner as others seemed to identify. Though I was eventually able to fix it, this seems like a database read fail shouldn't reset the queue, but cause it to fail and move on.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/firebase/firebase-queue/issues/48#issuecomment-294609624, or mute the thread https://github.com/notifications/unsubscribe-auth/AMsNqzSE4WU62raRN6j1UQA62MR8nAGyks5rw-QYgaJpZM4Hnuu4.