Closed Arech closed 13 years ago
the worker's call() method is augmented to supply itself
basically it just sends it's own id as well which is mapped in the master process to a Worker
instance
Mmmm... Well, yeah... But obviously it is Error object passed as "worker"-parameter to master.on('worker exception', function(worker, err) in logger.js:80, not the Worker object as the code expects. And the second parameter is undefined in that case
May be the error is specific to standalone use-case? I think you may try to replicate the exception by adding .use(logger()) and throwing exception somewhere in setInterval(function(){}) for worker branch of standalone.js. Something like:
var cluster = require('cluster')
, proc = cluster()
.set('socket path', '/var/tmp')
.use(cluster.logger('_logs',7))
.start();
if (proc.isWorker) {
var id = process.env.CLUSTER_WORKER;
console.log(' worker #%d started', id);
setInterval(function(){
console.log(' processing job for worker #%d', id);
if (Math.random()>0.8) throw new Error('Oups, hello exception!');
}, 1000);
} else {
setTimeout(function(){
proc.close();
}, 10000);
}
By the way, here is the part of receiver.js:
var worker
, obj = JSON.parse(this._buf[id]);
this._buf[id] = '';
if ('number' == typeof obj.id) worker = this.children[obj.id];
this.invoke(obj.method, obj.args, worker);
If for any reason obj.id is not a number, worker variable is going to be undefined, and that is unexpected...
Furthemore, if you put something like console.log("Id is %d",self.id); into this.master.call overload in worker.js:69, you will get a "Id is NaN" message.
Looks like the bug is really related to standalone use-case...
Yes. Looks like someone completely forgot to set worker.id in spawned process. Put
worker.id = parseInt( process.env.CLUSTER_WORKER );
into master.js:256 after var worker = new Worker(this);
and it will start to work like a charm without exceptions as expected.
Btw, first IF()
in Master.spawnWorker
is better to refactor into something like
if ('number' == typeof id) {
worker = new Worker(this).spawn(id)
this.children[id] = worker;
//worker.id = id;
} else {// generate an id
id=this.children.length;
worker = new Worker(this).spawn(id);
this.children.push(worker);
}
worker.id = id;
or you'll miss worker.id
in else{}
branch. I'm not sure it is essential, because there is a this.id=id
construct in Worker.spawn(), but why there was a worker.id = id;
statement after this.children[id] = worker;
inside of if(){}
?
the id is already generated, when the id is explicitly set it's so that a worker can die / be replaced but retain the same id instead of just pushing to the array creating a wide range of ids. The only time the id should not map properly like you are mentioning is when a worker fails immediately before it's id is known (via the socketpair)
Emmm, I'm sorry, I don't understand how to interpret you answer. Have you reproduced the exception in standalone use-case ?
Let me try to comment
the id is already generated, when the id is explicitly set it's so that a worker can die / be replaced but retain the same id instead of just pushing to the array creating a wide range of ids. The only time the id should not map properly like you are mentioning is when a worker fails immediately before it's id is known (via the socketpair)
There are two "storage places" of ID's in the cluster: ID of worker object inside master process (it is really explicitly set and worker can die and be replaced..., yes, true), and the ID of worker inside of spawned worker process. Looks like the latter ID is not set anywhere (at least in standalone use-case). This ID is passed to the spawned process via process.env.CLUSTER_WORKER
variable. It looks like this var is never assigned to the ID used by Master.call() function anywhere in spawned process. I've provided a patch for this.
var obj = {
method: 'connect'
, args: [worker.id, this.options]
};
worker.sock.write(utils.frame(obj), 'ascii', this.fd);
this is where it's passed to the worker, along with some options and the fd
so it should be fine unless the worker has been killed before even receiving that initial data, but yeah we could CLUSTER_WORKER as well (which I just added recently)
Dear visionmedia. I haven't written this long thread because I know, how it should work. I have written it because I know how it works under given conditions (standalone, worker is not killed before receiving inital data). Did you run the test case?
took a look at your example, the worker definitely does not seem to be receiving the connect call. looking into it
oh nvm i think i know what it is haha should be dead easy to fix
Tried to add random exception in standalone.js worker use-case and got exception inside logger plugin in logger.js:81
master.on('worker exception', function(worker, err){ 81: log.error('worker ' + worker.id + ' uncaught exception ' + err.message); });
while trying to reference err.message. A closer look into code shows that Master.workerException() function expects two parameters:
master.js:844 Master.prototype.workerException = function(worker, err){
while Worker passes only one - exception object:
worker.js:104 self.master.call('workerException', err);
self.master.call('workerException', err);