OptimalBits / bull

Premium Queue package for handling distributed jobs and messages in NodeJS.
Other
15.52k stars 1.43k forks source link

Missing lock for job #789

Closed p3x-robot closed 6 years ago

p3x-robot commented 6 years ago

Ciao!

It works awesome! How are you? But the problem is, that we have 4 deployments and every deployments has 8 processes which is 32 processes and if it always creates a new connect to Redis, it runs out of connections. I tried doing this like this (re-use Redis connections):

var opts = {
  prefix: `${global.ngivr.config.redis.scope}queue`,
//  redis: global.ngivr.config.redisUrl,
//  /*
  createClient: function (type) {
    switch (type) {
      case 'client':
        return global.ngivr.redis;
      case 'subscriber':
        return global.ngivr.redisSubscriber;
      default:
        return new Redis(global.ngivr.config.redis);
    }
  }
//  */
}

 kues[named] = Queue(named, opts);

But I always get this error:

failedReason: 'Missing lock for job 1 finished' } Error: job stalled more than allowable limit

I am using ioredis.

Do you know what it could be the problem?

Thanks, Patrik

p3x-robot commented 6 years ago

If i remove the createClient, it works , it's awesome, but we need to keep the connections low. :-1:

p3x-robot commented 6 years ago

I think it is the problem is with the prefixes:

 1) "ngivr-dev:ngivr-dev:queue:test:active"
 2) "ngivr-dev:queue:test:3:lock"
 3) "ngivr-dev:queue:test:4"
 4) "ngivr-dev:ngivr-dev:queue:test:4"
 5) "ngivr-dev:queue-test"
 6) "ngivr-dev:queue:test:3"
 7) "ngivr-dev:ngivr-dev:queue:test:stalled"
 8) "ngivr-dev:queue:test:2"
 9) "ngivr-dev:ngivr-dev:queue:test:stalled-check"
10) "ngivr-dev:ngivr-dev:queue:test:wait"
11) "ngivr-dev:ngivr-dev:queue:test:id"
12) "ngivr-dev:ngivr-dev:queue:test:2"
13) "ngivr-dev:ngivr-dev:queue:test:3"
14) "ngivr-dev:ngivr-dev:queue:test:1"
15) "ngivr-dev:ngivr-dev:queue:ticket:stalled-check"
16) "ngivr-dev:queue:test:1"
17) "ngivr-dev:queue:test:4:lock"
p3x-robot commented 6 years ago

It looks like it double prefixes.

p3x-robot commented 6 years ago

When I remove the createClient, it looks good:

 1) "ngivr-dev:queue:test:7"
 2) "ngivr-dev:queue:test:4"
 3) "ngivr-dev:queue-test"
 4) "ngivr-dev:queue:test:8"
 5) "ngivr-dev:queue:test:3"
 6) "ngivr-dev:queue:test:2"
 7) "ngivr-dev:queue:test:id"
 8) "ngivr-dev:queue:test:6"
 9) "ngivr-dev:queue:test:5"
10) "ngivr-dev:queue:test:completed"
11) "ngivr-dev:queue:test:stalled-check"
12) "ngivr-dev:queue:ticket:stalled-check"
13) "ngivr-dev:queue:test:1"
14) "ngivr-dev:queue:test:9"
manast commented 6 years ago

interesting. Definitely something to do with the prefixes and I think what is the reason for it. On the other hand, did you try using a sandboxed processor? that would also keep the redis connections low and job will run in separate processes.

p3x-robot commented 6 years ago

Ciao!

How do you do?

Well, we use 4 x 10 processes, every deployment is a cluster with a master, a singleton and 8 workers. Each worker is using BULL, i had to completely remove the prefixes, then it works. It is some prefix bug for sure, but for now it's more important to re-use redis for us, becuase it was opening too many connections and crashed the programs.

If I remove the prefixes it works. low priority.

Thanks so much !!!

p3x-robot commented 6 years ago

Yes it is a real server, not virtual.

manast commented 6 years ago

you don't need to use cluster if you use the new sandbox functionality, and it will only use one connection per queue: https://github.com/OptimalBits/bull#separate-processes

p3x-robot commented 6 years ago

well, given i have already done it and i wrote the cluster management, it will never block. given there are 8 bull / deployment. if it crashes it get's up, unless it forks more than 5 times within an X seconds. then it stops totally.

why do you think a singleton is better then many workers? given it is uses the same reddis clients?

plus for now we only have 1 scenario, plus we get e-mail for everything about the events on BULL.

We use the singleton process, only for cron, everything is in express, bull, mongoose etc ...

manast commented 6 years ago

its not a singleton, you define the amount of parallel workers in the concurrency argument. If a worker crashes a new one will be replaced. Everything is taken care of. And you get just one redis connection for all the workers.

p3x-robot commented 6 years ago

awesome, thanks so much!

p3x-robot commented 6 years ago

but if i set concurrency, the queue is still be serial right?

manast commented 6 years ago

It will run as many concurrent works as you specify in the concurrency argument.

p3x-robot commented 6 years ago

Well, I still have this error. I cannot continue. I keep this error:

Dec 13 12:47:07 server boot[1823]: [2017/12/13 12:47:07.163] [ERROR] [WORKER 002] [PID: 1855] Error: Missing lock for job 2 failed
Dec 13 12:47:07 server boot[1823]:     at Object.finishedErrors (/var/p3x/server.patrikx3.com/node_modules/bull/lib/scripts.js:149:23)
Dec 13 12:47:07 server boot[1823]:     at /var/p3x/server.patrikx3.com/node_modules/bull/lib/job.js:213:31
Dec 13 12:47:07 server boot[1823]:     at tryCatcher (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/util.js:16:23)
Dec 13 12:47:07 server boot[1823]:     at Promise._settlePromiseFromHandler (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/promise.js:512:31)
Dec 13 12:47:07 server boot[1823]:     at Promise._settlePromise (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/promise.js:569:18)
Dec 13 12:47:07 server boot[1823]:     at Promise._settlePromise0 (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/promise.js:614:10)
Dec 13 12:47:07 server boot[1823]:     at Promise._settlePromises (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/promise.js:693:18)
Dec 13 12:47:07 server boot[1823]:     at Async._drainQueue (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/async.js:133:16)
Dec 13 12:47:07 server boot[1823]:     at Async._drainQueues (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/async.js:143:10)
Dec 13 12:47:07 server boot[1823]:     at Immediate.Async.drainQueues [as _onImmediate] (/var/p3x/server.patrikx3.com/node_modules/bluebird/js/release/async.js:17:14)
Dec 13 12:47:07 server boot[1823]:     at runCallback (timers.js:773:18)
Dec 13 12:47:07 server boot[1823]:     at tryOnImmediate (timers.js:734:5)
Dec 13 12:47:07 server boot[1823]:     at processImmediate [as _immediateCallback] (timers.js:711:5)

If I use 1 core it works, but when I use a cluster, it is not working at all.

p3x-robot commented 6 years ago

Right now I even if I use 1 core it gives me the missing lock. What can I do?

p3x-robot commented 6 years ago

It happens, when I try to wait for the job to be completed.

queue.on('completed', done)

If I remove the completed it works, but then how can I get the done output???

manast commented 6 years ago

I would need some kind of test code that reproduces the issue, it is very difficult to find the reason to the problem otherwise. I feel that it is very strange that listening to the event completed cn trigger that behaviour. The error is reported because some worker already finalized a job and another worker is trying to finalize it again. Or that a worker took too much time to do the job and he lost the lock (which should not happen if you used external processors as recommended).

p3x-robot commented 6 years ago

Well, I found it, that it cannot be in a cluster, it can be only be in one process. Now it works. I am refactoring. I was bull in every process the workers, now I am moving to the singleton process (just one) and now it works.

p3x-robot commented 6 years ago

I moved to the singleton. In 1 process, the same error. No idea why it says :+1:

[2017/12/13 14:11:19.497] [ERROR] [SINGLETON] [PID: 16549] Error: Missing lock for job 5 failed
    at Object.finishedErrors (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bull/lib/scripts.js:149:23)
    at /home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bull/lib/job.js:213:31
    at tryCatcher (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues [as _onImmediate] (/home/patrikx3/Projects/patrikx3/corifeus/corifeus-server/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:773:18)
    at tryOnImmediate (timers.js:734:5)
    at processImmediate [as _immediateCallback] (timers.js:711:5)
p3x-robot commented 6 years ago

As you can see, not it is in 1 process. It worked for a few seconds, then the same error.

manast commented 6 years ago

can you provide example code that reproduces the issue?

p3x-robot commented 6 years ago

i fixed it with redis, for some reason, when use the an actual queue and queue.on and queue.off sometimes it does it, but i dont have time to deal with, so i used with redis and it works. so thanks so much! happy xmas! for me though it is a bit fishy, but given i dont have time to give code right now, i am happy. CIAO!

cristianfd commented 6 years ago

@p3x-robot how did you solve the problem? The problem was in Bull or in YOUR code?

p3x-robot commented 6 years ago

I think it happens when I am in a cluster using on complete, so i removed that code. The other error is is I think when I restart the bull server and it does this. Even though I count all, that are done with this code:

const count = async () => {
    const promises = [];
    for(let name of Object.keys(kues)) {
        promises.push(kues[name].getJobCounts());
    }
    const results = await Promise.all(promises);
    const total = {
        kue: {}
    };

    let serverScriptsIndex = 0;
    Object.keys(kues).forEach((kue, index) => {
        if (kue === 'server-scripts') {
            serverScriptsIndex = index;
        }
        total.kue[kue] = results[index];
    })

    results.forEach((kue, index) => {
//      console.log(job);
        /*
      if (index === serverScriptsIndex) {
          return;
      }
      */
      Object.keys(kue).forEach(type => {
          if (!total.hasOwnProperty(type)) {
              total[type] = 0;
          }
          total[type] += kue[type];
      })
    })
    total.waitTotal = total.waiting + total.active // + total.delayed;
    return total;
}

The server updates itself with bull:

const exec = require('../../web/exec');

const lib = require('../../../lib');
const settings = lib.settings;
const git = require('../../git');

const nodeExec = require('child_process').exec;

const utils = require('corifeus-utils');

let alreadyRestart = false;

module.exports = async (options) => {
    const { data, tmpobj} = options;

    await exec(data, `
        set -e    
        sudo -u root sh -c '
        cd /root/server-scripts
        ${git.command.reset}
        sudo yarn install --production --non-interactive
        '  
    `)

    if (alreadyRestart === false) {
        alreadyRestart = true;
        const expiry = Date.now()+ (1000 * 60 * 60);
        if (Date.now() > expiry) {
            const Mail = require('./mail');
            const mail = Mail();
            mail.send('[server-scripts] Waiting until last executing runs for over 1 hour.');
        }

        const count = await lib.bull.count();
        console.log(`Bull is still building: ${count.waitTotal}`);

        let waitAgain = false
        const restartInterval = setInterval(async() => {
            const count = await lib.bull.count();
            console.log(`Bull is still building: ${count.waitTotal}`);
            if (count.waitTotal === 0) {
                if (waitAgain) {
                    clearInterval(restartInterval);
                    nodeExec(`sudo service ${settings.systemd.service.notifier} restart && sudo service ${settings.webhook.service} restart`)
                } else {
                    waitAgain = true
                }
            }
        }, 10000)
    }
}

For some reason it is done, it is still saying it is missing a lock. I have no idea. But only when I restart.

p3x-robot commented 6 years ago

Right now, I wait a bit longer, and I hope that will work now finally. :)

cristianfd commented 6 years ago

When it happened to me, Arena UI also said me more stack on the error. You can see that error stack?

p3x-robot commented 6 years ago

I still get this, long output, but I just want to use Bull and I get and error, but it looks like it does works though:

Dec 23 22:47:39 server webhook[24798]: Time: 19457ms
Dec 23 22:47:39 server webhook[24798]:                             Asset     Size  Chunks                    Chunk Names
Dec 23 22:47:39 server webhook[24798]:    bundle.baa0118f8adc19c11fb0.js  1.83 MB       0  [emitted]  [big]  bundle
Dec 23 22:47:39 server webhook[24798]: polyfills.baa0118f8adc19c11fb0.js   113 kB       1  [emitted]         polyfills
Dec 23 22:47:39 server webhook[24798]:   bundle.baa0118f8adc19c11fb0.css   419 kB       0  [emitted]  [big]  bundle
Dec 23 22:47:39 server webhook[24798]:                        index.html  7.48 kB          [emitted]
Dec 23 22:47:39 server webhook[24798]:    [0] ./node_modules/rxjs/Observable.js 13.2 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:    [2] ./node_modules/@angular/core/esm5/core.js 542 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:    [8] ./node_modules/rxjs/Subject.js 5.61 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:   [12] ./node_modules/@angular/cdk/esm5/a11y.es5.js 87 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:   [13] ./node_modules/@angular/cdk/esm5/platform.es5.js 5.64 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:   [14] ./node_modules/@angular/cdk/esm5/bidi.es5.js 5.56 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:   [33] ./node_modules/@angular/platform-browser/esm5/platform-browser.js 159 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:  [539] ./test/angular-webpack/angular/polyfills.ts 116 bytes {1} [built]
Dec 23 22:47:39 server webhook[24798]:  [540] ./node_modules/corifeus-web/src/polyfills.ts 349 bytes {1} [built]
Dec 23 22:47:39 server webhook[24798]:  [698] ./test/angular-webpack/angular/bundle.aot.ts 383 bytes {0} [built]
Dec 23 22:47:39 server webhook[24798]:  [701] ./test/angular-webpack/angular/boot.common.ts 143 bytes {0} [built]
Dec 23 22:47:39 server webhook[24798]:  [708] ./test/angular-webpack/angular/module.ngfactory.js 18.5 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: [1056] ./test/angular-webpack/angular/layout.ngfactory.js 14.1 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: [1064] ./test/angular-webpack/angular/page/full-page.ngfactory.js 2.99 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: [1065] ./test/angular-webpack/angular/page/component.ngfactory.js 22.7 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:     + 1057 hidden modules
Dec 23 22:47:39 server webhook[24798]: Child test/angular-webpack/angular/page/raw.html:
Dec 23 22:47:39 server webhook[24798]:      1 asset
Dec 23 22:47:39 server webhook[24798]:        [0] ./test/angular-webpack/angular/page/raw.html 20 bytes {0} [built]
Dec 23 22:47:39 server webhook[24798]: Child test/angular-webpack/angular/layout.html:
Dec 23 22:47:39 server webhook[24798]:      1 asset
Dec 23 22:47:39 server webhook[24798]:        [0] ./test/angular-webpack/angular/layout.html 1.4 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: Child html-webpack-plugin for "index.html":
Dec 23 22:47:39 server webhook[24798]:      1 asset
Dec 23 22:47:39 server webhook[24798]:        [0] ./node_modules/html-webpack-plugin/lib/loader.js!./test/angular-webpack/index.html 279 bytes {0} [built]
Dec 23 22:47:39 server webhook[24798]: Child src/component/cory-mat-login/cory-mat-login.html:
Dec 23 22:47:39 server webhook[24798]:      1 asset
Dec 23 22:47:39 server webhook[24798]:        [0] ./src/component/cory-mat-login/cory-mat-login.html 1.27 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: Child extract-text-webpack-plugin node_modules/extract-text-webpack-plugin/dist node_modules/css-loader/index.js!node_modules/sass-loader/lib/loader.js??ref--10-3!src/scss/material/_all.scss:
Dec 23 22:47:39 server webhook[24798]:        [0] ./node_modules/css-loader!./node_modules/sass-loader/lib/loader.js?{"includePaths":["node_modules","bower_components","src","."]}!./src/scss/material/_all.scss 419 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:        [1] ./node_modules/css-loader/lib/css-base.js 2.26 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: Child extract-text-webpack-plugin node_modules/extract-text-webpack-plugin/dist node_modules/css-loader/index.js!node_modules/sass-loader/lib/loader.js??ref--10-3!test/angular-webpack/assets/styles.scss:
Dec 23 22:47:39 server webhook[24798]:        [0] ./node_modules/css-loader!./node_modules/sass-loader/lib/loader.js?{"includePaths":["node_modules","bower_components","src","."]}!./test/angular-webpack/assets/styles.scss 419 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]:        [1] ./node_modules/css-loader/lib/css-base.js 2.26 kB {0} [built]
Dec 23 22:47:39 server webhook[24798]: Done.
Dec 23 22:47:39 server webhook[24798]: Execution Time (2017-12-23 22:47:19 UTC+1)
Dec 23 22:47:39 server webhook[24798]: loading grunt-webpack   210ms  ▇ 1%
Dec 23 22:47:39 server webhook[24798]: webpack:cory-build-aot  19.5s  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99%
Dec 23 22:47:39 server webhook[24798]: Total 19.7s
Dec 23 22:47:39 server webhook[24798]:         set -e
Dec 23 22:47:39 server webhook[24798]:         cd /tmp/tmp-24798kK4xD5JdYCJl
Dec 23 22:47:39 server webhook[24798]:         yarn install --non-interactive
Dec 23 22:47:39 server webhook[24798]:         grunt clean copy:cory-build  webpack:cory-build-aot
Dec 23 22:47:39 server webhook[24798]:         sudo sh -c '
Dec 23 22:47:39 server webhook[24798]:         rm -rf /var/www/material.corifeus.com/public
Dec 23 22:47:39 server webhook[24798]:         mv build/browser /var/www/material.corifeus.com/public
Dec 23 22:47:39 server webhook[24798]:         chmod -R -w /var/www/material.corifeus.com/public
Dec 23 22:47:39 server webhook[24798]:         chmod -R og-rwx /var/www/material.corifeus.com/public
Dec 23 22:47:39 server webhook[24798]:         chown -R www-data:www-data /var/www/material.corifeus.com/public
Dec 23 22:47:39 server webhook[24798]: '
Dec 23 22:47:39 server webhook[24798]: { message: 'Missing lock for job 5 failed',
Dec 23 22:47:39 server webhook[24798]:   stack: 'Error: Missing lock for job 5 failed\n    at Object.finishedErrors (/root/server-scripts/node_modules/bull/lib/scripts.js:149:23)\n    at /root/server-scripts/node_modules/bull/lib/job.js:213:31\n    at tryCatcher (/root/server-scripts/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/root/server-scripts/node_modules/bluebird/js/release/promise.js:512:31)\n    at Promise._settlePromise (/root/server-scripts/node_modules/bluebird/js/release/promise.js:569:18)\n    at Promise._settlePromise0 (/root/server-scripts/node_modules/bluebird/js/release/promise.js:614:10)\n    at Promise._settlePromises (/root/server-scripts/node_modules/bluebird/js/release/promise.js:693:18)\n    at Async._drainQueue (/root/server-scripts/node_modules/bluebird/js/release/async.js:133:16)\n    at Async._drainQueues (/root/server-scripts/node_modules/bluebird/js/release/async.js:143:10)\n    at Immediate.Async.drainQueues [as _onImmediate] (/root/server-scripts/node_modules/bluebird/js/release/async.js:17:14)\n    at runCallback (timers.js:773:18)\n    at tryOnImmediate (timers.js:734:5)\n    at processImmediate [as _immediateCallback] (timers.js:711:5)' }
Dec 23 22:47:39 server webhook[24798]: send new mail subject: P3X-SERVER-SCRIPTS: server - webhook bull error corifeus-web-material
Dec 23 22:47:39 server webhook[24798]: 250 2.0.0 Ok: queued as BEE23342FE7
Dec 23 22:47:40 server webhook[24798]: Hash: e5ffa116831889cfb7bb
Dec 23 22:47:40 server webhook[24798]: Version: webpack 3.10.0 / grunt-webpack 3.0.2
Dec 23 22:47:40 server webhook[24798]: Time: 20916ms
Dec 23 22:47:40 server webhook[24798]:                             Asset     Size  Chunks                    Chunk Names
Dec 23 22:47:40 server webhook[24798]:    bundle.e5ffa116831889cfb7bb.js  1.85 MB       0  [emitted]  [big]  bundle
Dec 23 22:47:40 server webhook[24798]: polyfills.e5ffa116831889cfb7bb.js   113 kB       1  [emitted]         polyfills
Dec 23 22:47:40 server webhook[24798]:   bundle.e5ffa116831889cfb7bb.css   424 kB       0  [emitted]  [big]  bundle
Dec 23 22:47:40 server webhook[24798]:                        index.html  7.45 kB          [emitted]
Dec 23 22:47:40 server webhook[24798]:    [0] ./node_modules/rxjs/Observable.js 13.2 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:    [2] ./node_modules/@angular/core/esm5/core.js 542 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:    [8] ./node_modules/@angular/material/esm5/core.es5.js 90.8 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:   [12] ./node_modules/@angular/cdk/esm5/bidi.es5.js 5.56 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:   [13] ./node_modules/@angular/cdk/esm5/a11y.es5.js 87 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:   [25] ./node_modules/@angular/platform-browser/esm5/platform-browser.js 159 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:  [545] ./src/angular/polyfills.ts 116 bytes {1} [built]
Dec 23 22:47:40 server webhook[24798]:  [546] ./node_modules/corifeus-web/src/polyfills.ts 349 bytes {1} [built]
Dec 23 22:47:40 server webhook[24798]:  [704] ./src/angular/bundle.aot.ts 385 bytes {0} [built]
Dec 23 22:47:40 server webhook[24798]:  [707] ./src/angular/bundle.common.ts 155 bytes {0} [built]
Dec 23 22:47:40 server webhook[24798]:  [714] ./src/angular/module.ngfactory.js 18.8 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: [1081] ./node_modules/@angular/material/tooltip/typings/index.ngfactory.js 6.55 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: [1082] ./node_modules/@angular/material/snack-bar/typings/index.ngfactory.js 8.91 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: [1083] ./node_modules/corifeus-web-material/src/services/notify/notify-component.ngfactory.js 4.81 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: [1084] ./src/angular/layout/cory-layout.ngfactory.js 16.8 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:     + 1084 hidden modules
Dec 23 22:47:40 server webhook[24798]: Child src/angular/layout/cory-layout.html:
Dec 23 22:47:40 server webhook[24798]:      1 asset
Dec 23 22:47:40 server webhook[24798]:        [0] ./src/angular/layout/cory-layout.html 1.93 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child src/angular/layout/footer/cory-layout-footer.html:
Dec 23 22:47:40 server webhook[24798]:      1 asset
Dec 23 22:47:40 server webhook[24798]:        [0] ./src/angular/layout/footer/cory-layout-footer.html 1.29 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child src/angular/layout/header/cory-layout-header.html:
Dec 23 22:47:40 server webhook[24798]:      1 asset
Dec 23 22:47:40 server webhook[24798]:        [0] ./src/angular/layout/header/cory-layout-header.html 2.64 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child node_modules/corifeus-web-material/src/component/cory-mat-login/cory-mat-login.html:
Dec 23 22:47:40 server webhook[24798]:      1 asset
Dec 23 22:47:40 server webhook[24798]:        [0] ./node_modules/corifeus-web-material/src/component/cory-mat-login/cory-mat-login.html 1.27 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child html-webpack-plugin for "index.html":
Dec 23 22:47:40 server webhook[24798]:      1 asset
Dec 23 22:47:40 server webhook[24798]:        [0] ./node_modules/html-webpack-plugin/lib/loader.js!./src/index.html 247 bytes {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child extract-text-webpack-plugin node_modules/extract-text-webpack-plugin/dist node_modules/css-loader/index.js!node_modules/sass-loader/lib/loader.js??ref--10-3!node_modules/corifeus-web-material/src/scss/material/_all.scss:
Dec 23 22:47:40 server webhook[24798]:        [0] ./node_modules/css-loader!./node_modules/sass-loader/lib/loader.js?{"includePaths":["node_modules","bower_components","src","."]}!./node_modules/corifeus-web-material/src/scss/material/_all.scss 419 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:        [1] ./node_modules/css-loader/lib/css-base.js 2.26 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Child extract-text-webpack-plugin node_modules/extract-text-webpack-plugin/dist node_modules/css-loader/index.js!node_modules/sass-loader/lib/loader.js??ref--10-3!src/assets/style.scss:
Dec 23 22:47:40 server webhook[24798]:        [0] ./node_modules/css-loader!./node_modules/sass-loader/lib/loader.js?{"includePaths":["node_modules","bower_components","src","."]}!./src/assets/style.scss 424 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]:        [1] ./node_modules/css-loader/lib/css-base.js 2.26 kB {0} [built]
Dec 23 22:47:40 server webhook[24798]: Done.
Dec 23 22:47:40 server webhook[24798]: Execution Time (2017-12-23 22:47:19 UTC+1)
Dec 23 22:47:40 server webhook[24798]: loading grunt-webpack   213ms  ▇ 1%
Dec 23 22:47:40 server webhook[24798]: webpack:cory-build-aot    21s  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99%
Dec 23 22:47:40 server webhook[24798]: Total 21.2s
Dec 23 22:47:40 server webhook[24798]:         set -e
Dec 23 22:47:40 server webhook[24798]:         yarn config set registry https://npm.patrikx3.com
Dec 23 22:47:40 server webhook[24798]:         git clone file:///var/git/corifeus-app-web-pages.git /tmp/tmp-24798q0rHX9Z605H5
Dec 23 22:47:40 server webhook[24798]:         cd /tmp/tmp-24798q0rHX9Z605H5
Dec 23 22:47:40 server webhook[24798]:                 
Dec 23 22:47:40 server webhook[24798]:         yarn install --non-interactive
Dec 23 22:47:40 server webhook[24798]:         #npm install
Dec 23 22:47:40 server webhook[24798]:         grunt clean copy:cory-build webpack:cory-build-aot
Dec 23 22:47:40 server webhook[24798]:         sudo sh -c '
Dec 23 22:47:40 server webhook[24798]:         rm -rf /var/www/pages.corifeus.com/public
Dec 23 22:47:40 server webhook[24798]:         mv build/browser /var/www/pages.corifeus.com/public
Dec 23 22:47:40 server webhook[24798]:         chmod -R -w /var/www/pages.corifeus.com/public
Dec 23 22:47:40 server webhook[24798]:         chmod -R og-rwx /var/www/pages.corifeus.com/public
Dec 23 22:47:40 server webhook[24798]:         chown -R www-data:www-data /var/www/pages.corifeus.com/public
Dec 23 22:47:40 server webhook[24798]: '
Dec 23 22:47:40 server webhook[24798]:     
Dec 23 22:47:41 server webhook[24798]: { message: 'Missing lock for job 5 failed',
Dec 23 22:47:41 server webhook[24798]:   stack: 'Error: Missing lock for job 5 failed\n    at Object.finishedErrors (/root/server-scripts/node_modules/bull/lib/scripts.js:149:23)\n    at /root/server-scripts/node_modules/bull/lib/job.js:213:31\n    at tryCatcher (/root/server-scripts/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/root/server-scripts/node_modules/bluebird/js/release/promise.js:512:31)\n    at Promise._settlePromise (/root/server-scripts/node_modules/bluebird/js/release/promise.js:569:18)\n    at Promise._settlePromise0 (/root/server-scripts/node_modules/bluebird/js/release/promise.js:614:10)\n    at Promise._settlePromises (/root/server-scripts/node_modules/bluebird/js/release/promise.js:693:18)\n    at Async._drainQueue (/root/server-scripts/node_modules/bluebird/js/release/async.js:133:16)\n    at Async._drainQueues (/root/server-scripts/node_modules/bluebird/js/release/async.js:143:10)\n    at Immediate.Async.drainQueues [as _onImmediate] (/root/server-scripts/node_modules/bluebird/js/release/async.js:17:14)\n    at runCallback (timers.js:773:18)\n    at tryOnImmediate (timers.js:734:5)\n    at processImmediate [as _immediateCallback] (timers.js:711:5)' }
Dec 23 22:47:41 server webhook[24798]: send new mail subject: P3X-SERVER-SCRIPTS: server - webhook bull error corifeus-app-web-pages
Dec 23 22:47:41 server webhook[24798]: 250 2.0.0 Ok: queued as 45DE2342FE7
p3x-robot commented 6 years ago

The stack:

Error: Missing lock for job 2 failed
    at Object.finishedErrors (/root/server-scripts/node_modules/bull/lib/scripts.js:149:23)
    at /root/server-scripts/node_modules/bull/lib/job.js:213:31
    at tryCatcher (/root/server-scripts/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/root/server-scripts/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/root/server-scripts/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/root/server-scripts/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/root/server-scripts/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/root/server-scripts/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/root/server-scripts/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues [as _onImmediate] (/root/server-scripts/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:773:18)
    at tryOnImmediate (timers.js:734:5)
    at processImmediate [as _immediateCallback] (timers.js:711:5)
cristianfd commented 6 years ago

How many jobs are you executing at the same time? A few time ago its happened to me when I've error in one job and its produces the other jobs to be cancelled. Check all your queue jobs if you hace many at the same time. On the other hand, I recommend you to delete all the await/async and use the promise itself (with then and catch), maybe the catch stament helps you.

p3x-robot commented 6 years ago

I think it happens when i restart the bull server. How can I restart the bull server? Should I do something? Because when I count I only restart, when all queues all zero.

ianstormtaylor commented 6 years ago

FWIW, this was happening to me when I had upgraded to 3.x but forgot to clear the existing queues from Redis, so they were still in the 2.x format.

WarTech9 commented 4 years ago

This error seems to be related to running a long running operation in the on('completed') event handler. I had this before:

queue.on('completed', (job, result) => {
    logger.debug('job completed with result: ',  result)
    socketController.notify(job.id, {status: 'completed'})
})

The socketController.notify() line was sending a message over websockets to clients. I was getting the Missing lock for job error every single time. I tried many different things to no avail.

Then I moved the notify() line from this callback into the queue.process() function at the end of that, when the job was done processing, leaving just the debug() line in the completed handler. The error went away.

Another user reported this error going away after commenting out lines using queue events: here

For anyone experiencing this issue it might be worth removing any substantial code from the on('complete') handler.

seanharr11 commented 4 years ago

I had an exception in the on('complete', (job, result) => {...}) callback, and the exception was masquerading as Missing lock for job ... unsure what the reason is, but fixing the handler fixed the problem.

theo-lubert commented 4 years ago

We also have this issue, but without any specific configuration (just host and port), no custom redis client, and no prefix. We have a simple winston log on workers completed event, no exception and no long running task in the event handler.

Error: Missing lock for job repeat:myjob:00f4c548cdc615774bd5dce8f54394ec:1591048800000 failed
    at Function.finishedErrors (/app/node_modules/bullmq/src/classes/scripts.ts:203:16)
    at Job.moveToFailed (/app/node_modules/bullmq/src/classes/job.ts:312:23)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at handleFailed (/app/node_modules/bullmq/src/classes/worker.ts:286:9)
    at Worker.run (/app/node_modules/bullmq/src/classes/worker.ts:137:27)
BogdanStanciu commented 4 years ago

I have the same problem.

Error: Missing lock for job 404rj0ikeif7ka4 failed
at Object.finishedErrors (/app/node_modules/bull/lib/scripts.js:184:16)
at Job.moveToFailed (/app/node_modules/bull/lib/job.js:303:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)

with failedReason: job stalled more than allowable limit

All the code is run inside a docker.

marceloudi commented 4 years ago

This problem was happening here. @inn0vative1 gives the solution (2020-04-27)

queueMysql.on("completed", (job, result) => {
  console.log(`Completed: Job ${job.data.id} completed with result ${JSON.stringify(result)}`);
  redis.del(job.data.k); //-->This has moved to queueMysql.process()
})

Just thinking about a definitive solution.

adamreisnz commented 3 years ago

This appears to be still an issue in the new bullmq repo.

Error: Missing lock for job repeat:5efbf1e42d9fa73887b98be39ccc31d5:1620445800000 failed
    at Function.finishedErrors (node_modules/bullmq/dist/classes/scripts.js:143:24)
    at Job.moveToFailed (node_modules/bullmq/dist/classes/job.js:225:41)
    at async handleFailed (node_modules/bullmq/dist/classes/worker.js:239:17)
manast commented 3 years ago

It is better you open a new issue with the steps to reproduce it. "Missing lock for job" is a legit error that can happen, so without knowing the context it is impossible to decide if it is an issue or not...