calmh / node-snmp-native

Native Javascript SNMP library for Node.js
MIT License
252 stars 65 forks source link

Is it good at Scaling #42

Closed zhaohangbo closed 8 years ago

zhaohangbo commented 8 years ago

If I have millions of network devices. How can I scale this snmp manager service ?

bangert commented 8 years ago

Depends on what your limiting factor is ;-)

We scan about 40 000 values on about 5 000 devices in less than 20 seconds. I believe we are network io bound here. I dont have exact numbers sorry.

We were positively surprised at how fast it is, but our problem is three orders of magnitude smaller so your milage may vary.

Modzor13 commented 8 years ago

Hi there, I am working on a project and want to incorporate your package here, those stats sound pretty fantastic. I have mocked something up but I keep running into an issue where the first 100-200 hosts/devices work great but them it seems to hit a wall and the remainder time out. Can you provide a sample for working with bulk devices in a speedy manner such as what you have?

bangert commented 8 years ago

well - turns out my numbers were wrong. we have about 23.000 interfaces of which we monitor 14 distinct values on approx. 5000 hosts, and that currently takes 47 seconds.

unfortunately i cant give out any of our production code, but here's the idea of our snmp worker:

snmp.prototype.getMany = function(jobs, res_cb, done_cb) {
  var session = new SnmpNative.Session(this.config);
  async.forEachSeries(jobs, 
    function(set, set_done_cb) {

      session.getAll({
          oids:        set.oids,
          abortOnError: true,
          host:         set.ip,
          port:         'port' in set ? set.port : 161,
          community:    set.community 
        }, 
        function(err, varbinds) {
          if (err) {
            return set_done_cb(err + ' - ' + set.ip);
          }

         //success - pick up values

Hope this helps...

bangert commented 8 years ago

We sometimes run into issues with a firewall not being able to keep up with the number of new sessions created by our poller. In those cases we move the poller closer to the monitored devices.

calmh commented 8 years ago

Key to any kind of performance is to create one session and use it for all hosts, as bangert is doing above. Make sure that is what you're doing, Modzor13.

Modzor13 commented 8 years ago

Thank you guys for all the help, I took this information and replaced the async.forEachSeries with a call to async.queue and drastically improved the performance for my scenario. It appears that my problem was it trying to process all of them at ONE time which caused a huge loss of data. The queue call is allowing me to parallel process a set number of items at a time and not letting the timeouts clog up the system.