PrivateSky / swarmcore

Swarm 2.0 implementation
Other
15 stars 4 forks source link

Parallel Swarms go missing in Action #8

Open jwulf opened 9 years ago

jwulf commented 9 years ago

I'm launching parallel swarms, and they don't behave as expected. Some go missing, and some stop after a while.

So I made a minimal test case of 4 parallel swarms that just visit different adapters and invoke a message in the adapter.

What happened to Swarm 2??

Here's the output:

WebTestAdapter /startParallelTest +0ms
  ParallelTestSwarm Starting Parallel Test Swarm 1 +12ms
  ParallelTestSwarm Starting Parallel Test Swarm 2 +21ms
  ParallelTestSwarm Starting Parallel Test Swarm 3 +7ms
  ParallelTestSwarm Starting Parallel Test Swarm 4 +4ms
  ParallelTestSwarm Swarm 1 entered MindBodyAdapter  +0ms
  MindBodyAdapter Swarm 1 payload is: 229676e0-1d55-11e5-959b-03f820ecbeb2 +16ms
  ParallelTestSwarm Swarm 4 entered MindBodyAdapter  +12ms
  MindBodyAdapter Swarm 4 payload is: 229b0ac0-1d55-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 4 entered ParseAdapter  +0ms
  ParseAdapter Swarm 4 payload is: 229b0ac0-1d55-11e5-959b-03f820ecbeb2 +5ms
Clearing redis information about dead node  _AtmaMindBodyAPIAdapter(d67d3c89-5b7e-4cca-8f4a-0fe1ae3116a8)
Clearing redis information about dead node  _AtmaMindBodyAPIAdapter(947562e4-8c02-49db-800d-b65930a535b5)
Clearing redis information about dead node  _AtmaParseAdapter(d87c9bec-e81d-4dcd-950a-b58dfc37fb1d)
  ParallelTestSwarm Swarm 3 entered MindBodyAdapter  +136ms
  MindBodyAdapter Swarm 3 payload is: 229a9590-1d55-11e5-959b-03f820ecbeb2 +0ms
Clearing redis information about dead node  _AtmaCalendarAdapter(e3832ea6-c83d-4bf8-ae21-40c0f8b89edd)
  ParallelTestSwarm Swarm 1 entered ParseAdapter  +115ms
  ParseAdapter Swarm 1 payload is: 229676e0-1d55-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 3 entered ParseAdapter  +6ms
  ParseAdapter Swarm 3 payload is: 229a9590-1d55-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 1 entered CalendarAdapter  +6s
  CalendarAdapter Swarm 1 payload is: 229676e0-1d55-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 3 entered CalendarAdapter  +21ms
  CalendarAdapter Swarm 3 payload is: 229a9590-1d55-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 4 entered CalendarAdapter  +11ms
  CalendarAdapter Swarm 4 payload is: 229b0ac0-1d55-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 3 entered ConductorAdapter  +0ms
  Conductor Swarm 3 payload is: 229a9590-1d55-11e5-959b-03f820ecbeb2 +7ms
Clearing redis information about dead node  _AtmaConductorAdapter(48f9959b-09c0-4187-8f60-c299bb5c2636)
Clearing redis information about dead node  _AtmaConductorAdapter(48f9959b-09c0-4187-8f60-c299bb5c2636)
  ParallelTestSwarm Swarm 1 entered ConductorAdapter  +89ms
  Conductor Swarm 1 payload is: 229676e0-1d55-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 4 entered ConductorAdapter  +14ms
  Conductor Swarm 4 payload is: 229b0ac0-1d55-11e5-959b-03f820ecbeb2 +0ms

Here's the code to spawn the parallel swarms in the Adapter:

dispatcher.onGet('/startParallelTest', function (req, res){
    res.writeHead(200, {'content-Type': 'text/plain'});
    res.end('Starting Parallel Test Swarm');
    var PARALLEL_SWARMS = 5;
    for (var worker=1; worker < PARALLEL_SWARMS; worker ++ ) {
        this.startSwarm('ParallelTestSwarm.js', "start", worker);        
    }
});

Here is the Swarm code:

var debug=require('debug')('ParallelTestSwarm');
var uuid=require('node-uuid');

ParallelTestSwarm = {
  vars: { 
    myuuid: 0,
    payload: 0
  },
  start: function (id) {
    debug('Starting Parallel Test Swarm ' + id);
        this.myuuid = id;
    this.payload = uuid.v1();
        this.swarm('first');
  },
  first: {
      node: "AtmaMindBodyAPIAdapter",
      code: function () {
          debug('Swarm ' + this.myuuid + ' entered MindBodyAdapter ');
          adapterDebugMessage('Swarm ' + this.myuuid + ' payload is: ' + this.payload);
          this.swarm('second');
      }
  },
  second: {
      node: "AtmaParseAdapter",
      code: function () {
          var self = this;
          debug('Swarm ' + this.myuuid + ' entered ParseAdapter ');
          adapterDebugMessage('Swarm ' + this.myuuid + ' payload is: ' + this.payload,
            createSwarmCallback(function(){
                  self.swarm('third');
              }));
      }
  },
    third: {
      node: "AtmaCalendarAdapter",
      code: function () {
          debug('Swarm ' + this.myuuid + ' entered CalendarAdapter ');
          adapterDebugMessage('Swarm ' + this.myuuid + ' payload is: ' + this.payload);
          this.swarm('fourth');
      }
  },
  fourth: {
      node: "AtmaConductorAdapter",
      code: function () {
          debug('Swarm ' + this.myuuid + ' entered ConductorAdapter ');
          adapterDebugMessage('Swarm ' + this.myuuid + ' payload is: ' + this.payload);
      }
  }
};

 ParallelTestSwarm;
jwulf commented 9 years ago

And here is the adapterDebugMessage function code:

adapterDebugMessage = function (msg, callback) {
  debug(msg);
  if (callback) callback();
};
jwulf commented 9 years ago

It seems quite unreliable, running it again gives me:

(This time Swarms 1 and 3 went missing)

 WebTestAdapter /startParallelTest +10m
  ParallelTestSwarm Starting Parallel Test Swarm 1 +6ms
  ParallelTestSwarm Swarm 1 payload is 9597e1a0-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Starting Parallel Test Swarm 2 +10ms
  ParallelTestSwarm Swarm 2 payload is 95996840-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Starting Parallel Test Swarm 3 +18ms
  ParallelTestSwarm Swarm 3 payload is 959c2760-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Starting Parallel Test Swarm 4 +8ms
  ParallelTestSwarm Swarm 4 payload is 959d5fe0-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 2 entered MindBodyAdapter  +10m
  MindBodyAdapter Swarm 2 payload is: 95996840-1d56-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 4 entered MindBodyAdapter  +14ms
  MindBodyAdapter Swarm 4 payload is: 959d5fe0-1d56-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 2 entered ParseAdapter  +10m
  ParseAdapter Swarm 2 payload is: 95996840-1d56-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 4 entered ParseAdapter  +17ms
  ParseAdapter Swarm 4 payload is: 959d5fe0-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 2 entered CalendarAdapter  +10m
  CalendarAdapter Swarm 2 payload is: 95996840-1d56-11e5-959b-03f820ecbeb2 +0ms
Clearing redis information about dead node  _AtmaMindBodyAPIAdapter(d67d3c89-5b7e-4cca-8f4a-0fe1ae3116a8)
Clearing redis information about dead node  _AtmaMindBodyAPIAdapter(d67d3c89-5b7e-4cca-8f4a-0fe1ae3116a8)
  ParallelTestSwarm Swarm 4 entered CalendarAdapter  +20ms
  CalendarAdapter Swarm 4 payload is: 959d5fe0-1d56-11e5-959b-03f820ecbeb2 +0ms
  ParallelTestSwarm Swarm 2 entered ConductorAdapter  +10m
  Conductor Swarm 2 payload is: 95996840-1d56-11e5-959b-03f820ecbeb2 +1ms
  ParallelTestSwarm Swarm 4 entered ConductorAdapter  +3ms
  Conductor Swarm 4 payload is: 959d5fe0-1d56-11e5-959b-03f820ecbeb2 +0ms
salboaie commented 9 years ago

I fail to reproduce this bug. Take a look in SwarmESB, i have added a special adapter demoBroadcast.js and ParallelSwarmsTest.js swarm. Everything works as expected... Can you confirm or infirm it on your system? May be you have an old version of adapter AtmaMindBodyAPIAdapter running and swallows swarm messages because it fails? Close or node processes, run again, it still reproducible? Could you make my example going wrong?