Implement blocking - Githubissues

ProZachJ commented 10 years ago

Use the bouncer model to implement black list and grey list blocking.

Blacklist: Block a given IP for a given amount of time.
Greylist : Block a given IP from a specific URL for a given amount of time.

How bouncer implements this:

Greylist, Blacklist and Sweeplist Globals:

//GLOBALs
var upstreamConnection;
var blacklist = {};
var sweeplist = [];
var greylist = {};
var connections = [];
var requests = [];
var disabledUrls = [];
var totalConnections = 0;
var headerTimeout = 10000;
var requestTimeout = 120000;

The proxy connects to an upstream server (aggregator) and listens for commands. @mattjay Perhaps we could just regularly poll the database for these values instead of an upstream server? Would there be a problem polling every second and would that provide enough time granularity in responding to threats?

//This connects to the aggregation server and accepts upstream commands.
setInterval(function() {
  if (!upstreamConnection || upstreamConnection == null) {
    upstreamConnection = net.connect({host: UPSTREAM_LOGSERVER, port: argv.P})
    //connect to aggregator in "server" mode
    upstreamConnection.write('S');
    upstreamConnection.on('data', function(data) {
      commandDo(data);
    });
    //destroys upstream if the connection is dead
    upstreamConnection.on('error', function () {
      return upstreamConnection = null
    });
  };
},1000);

Block command does two things.

Adds newly blocked IP to the Blacklist Global which is checked each incoming request.
Adds newly blocked IP to sweeplist so current connections from that IP can be terminated this relies on the request global to keep track of all current requests.

This is the logic for the block command.

if (/^block.*/.test(cmd)) {
    cmd = cmd.slice(6).split("|")
    var timeToBlock =  new Date().getTime() + parseInt(cmd[1]);
    sweeplist.push(cmd[0]);
    blacklist[cmd[0]] = timeToBlock;
}

The blacklist check is done both in the request handler and on 'connection'.

var proxyServer = http.createServer(function (req, res) {
  if (checkBlacklist(req.socket.remoteAddress) && checkGreylist(req.socket.remoteAddress,req.url)) {
    totalConnections += 1;
    proxy.proxyRequest(req, res, {
    host: HTTP_SERVER,
    port: HTTP_PORT
    });

The connection event check allows blocking of blacklisted IPs to occur as early as possible.

proxyServer.on('connection', function (req, c, h) {
  //mark the start time vs slow loris
  if (checkBlacklist(req.remoteAddress)) {
    req.startTime = new Date().getTime();
    connections.push(req);
    try {
      upstreamConnection.write(buildConnectMessage(req) + "\n");
    } catch (e) {}
  } else {
    req.end()
  }
});

The sweeplist regularly drops all requests from newly blacklisted IPs in case we were mid request when the block command was recieved. IE the connection and request checks had already occurred.

//Sweep newly blacklisted servers right away
setInterval(function() {
  if (sweeplist.length > 0 && requests.length > 0) {
      requests.forEach(function (req) {
        if (sweeplist.indexOf(req.socket.remoteAddress) > -1) {
          req.socket.end();
          requests.splice(requests.indexOf(req),1);
        };
      });
  } return sweeplist = [];
} ,1000);

Since we are keeping a request global you have to manualy manage it. This adds new requests

proxyServer.on('request', function (req, res) {
  //remove good requests from garbage collection
  req.startTime = new Date().getTime();
  requests.push(req);
  connections.splice(connections.indexOf(req),1);
});

This removes successfully proxied requests from request global

JavaScript proxy.on('end', function (req) { requests.splice(requests.indexOf(req),1); try { upstreamConnection.write(buildEndMessage(req) + "\n"); } catch (e) {} });

ProZachJ commented 10 years ago

Thinking through the db polling model for blocking. We'd need to change the structure of allowed hosts to include their black and greylists. Blocks associated with specific hosts could only be checked 'on-request'. At what point would we escalate an IP block to be 'shield wide' so that we could check 'on-connection'?

ProZachJ commented 10 years ago

The db polling model would also require that any consumers that we develop to intelligently make blocking decisions would write to the db to accomplish that.

ProZachJ commented 10 years ago

Perhaps db polling is the right approach for host based blocking and certain consumers that would need to process requests in their entirety, but other consumers (like the bouncer DOS ones) just need connection data so are better suited for the aggregator model.

ProZachJ commented 10 years ago

I think in our approach these "globals" should be private to the createServer method.

From ./lib/proxyserver.js:

module.exports = function createServer (proxy, allowed_hosts, port) {
//Globals created here
var server = http.createServer(function (request, response) {
    if (request.headers.host && allowed_hosts[request.headers.host.replace(/\./g, "")] === 'enabled') {
      //if(!checkBlacklist(request.socket.remoteIP)){
      ...
this.startServer = function (){
    server.listen(port);
  };
  this.stopServer = function (){
    server.close();
  };
  //this.addBlackListIP = function (domain, IP, time){
  //};
  return this;
};

If we use this model to expose setters to ./startproxy.js we can put the db polling code in this file just like the check for allowed hosts is there now.

ProZachJ commented 10 years ago

This will likely require a change to our allowed_hosts structure so that we can inject the enabled hosts along with each of their blacklists/greylists.

ProZachJ commented 10 years ago

var mongoose = require('mongoose');

var hostSchema = mongoose.Schema({
  hostname: String,
  status: String,
  //blacklist: [{ip: timetoblock}]
});

module.exports = mongoose.model('Host', hostSchema);

ProZachJ commented 10 years ago

Actually, since we are injecting allowed hosts we don't need some of the globals.

module.exports = function createServer (proxy, allowed_hosts, port) {
//var requests = [];

proxy.on('end', function (req) {
  requests.splice(requests.indexOf(req),1);
});
proxy.on('error', function(proxy) {
 //log error?
});

var server = http.createServer(function (request, response) {
  request.startTime = new Date().getTime();
  requests.push(request);
  if (request.headers.host && allowed_hosts[request.headers.host.replace(/\./g, "")] === 'enabled') {
      //if(!checkBlacklist(request.socket.remoteIP)){
      ...
  this.startServer = function (){
    server.listen(port);
  };
  this.stopServer = function (){
    server.close();
  };
   /*
   this.addBlackListIP = function (domain, ip, time){
    allowed_hosts[domain].blasklist.push({ip:time});
   };
   this.getRequests = function(){
     return requests
   };
   */
  return this;
};

ProZachJ commented 10 years ago

The getRequests method is needed so that the sweeplist function can be in startproxy.js

Is that the way it should be done?

ProZachJ commented 10 years ago

startproxy.js would look something like:

var allowed_hosts = {};
var sweeplist = [];

Host.find({}, function(err, hosts) {
  if(!err) {
    for (var i = 0; i < hosts.length; i++) {
      allowed_hosts[hosts[i].hostname] = hosts[i].status;
    }

    var proxy = httpProxy.createProxyServer();
    var server = createServer(proxy, allowed_hosts, port);

    //WebSocket Support
    server.on('upgrade', function (req, socket, head) {
      proxy.ws(req, socket, head);
    });
    server.startServer();

    //This connects to the db and checks for new blocks.
    setInterval(function() {
      Host.find({}, function(err, hosts) {
        if(!err) {
         //compare allowed_hosts to new values
         //if different push new blocks addBlackListIP(domain, ip, time)
         //push to sweeplist
        }else{
        //may need to reconnect to db
       }
     },1000);

    //Sweep newly blacklisted servers right away
    setInterval(function() {
      var requests = server.getRequests();
      if (sweeplist.length > 0 && requests.length > 0) {
        requests.forEach(function (req) {
          if (sweeplist.indexOf(req.socket.remoteAddress) > -1) {
            req.socket.end();
            requests.splice(requests.indexOf(req),1);
          };
        });
      } return sweeplist = []
    } ,1000);
  }
});

ProZachJ commented 10 years ago

This code will need to be adjusted to fit new host schema:

for (var i = 0; i < hosts.length; i++) {
      allowed_hosts[hosts[i].hostname] = hosts[i].status;
    }

ProZachJ commented 10 years ago

@mattjay I think all of this makes sense. If it does the next step is to figure out how to unit test it. Which may require a completely different structure...seeing how non of startproxy.js is exported....and here is where I start thinking in circles for now. We'll discuss tomorrow.

mattjay commented 10 years ago

After stewing on this I believe it makes as much sense as anything I would've come up with. I think there are probably blind spots of "we don't know what we don't know" situations but we need to implement and try.

mattjay commented 10 years ago

@ProZachJ -

      var requests = server.getRequests();

what does the getRequests() function need to look like? not sure what the second setInterval is doing with the sweepList.

mattjay commented 10 years ago

   //This connects to the db and checks for new blocks.
    setInterval(function() {
      Host.find({}, function(err, hosts) {
        if(!err) {
         //compare allowed_hosts to new values
         //if different push new blocks addBlackListIP(domain, ip, time)
         //push to sweeplist
        }else{
        //may need to reconnect to db
       }
     },1000);

I mocked and tested most of this in: 982485c77ed1a66384e576bae15fa3843078976d

//Sweep newly blacklisted servers right away
    setInterval(function() {
      var requests = server.getRequests();
      if (sweeplist.length > 0 && requests.length > 0) {
        requests.forEach(function (req) {
          if (sweeplist.indexOf(req.socket.remoteAddress) > -1) {
            req.socket.end();
            requests.splice(requests.indexOf(req),1);
          };
        });
      } return sweeplist = []
    } ,1000);

I'm not sure what this part is doing so I don't want to start to mock it incorrectly.

mattjay commented 10 years ago

ok @ProZachJ the only part not written is:

 if (request.headers.host && allowed_hosts[request.headers.host.replace(/\./g, "")] === 'enabled') {
      //if(!checkBlacklist(request.socket.remoteIP)){
      ...

that if block there. the setIntervals are cooking and tested.

ProZachJ commented 10 years ago

@mattjay Next step is to write tests for a function (checkBlacklist) that will return true if an ip is in the blacklist and false if it isn't.

mattjay commented 10 years ago

@ProZachJ thats easy but do we make some sort of decision if it is? req.socket.end()?

ProZachJ commented 10 years ago

@mattjay that is already in the else block below. So if we make this a complex single if:

if(request.headers.host && allowed_hosts[request.headers.host.replace(/\./g, "").status === 'enabled' && checkblacklist(remoteIP)
//remote IP likely needs logic for x-forwarded for if we are behind load balancer

mattjay commented 10 years ago

&& checkblacklist(remoteIP) == false though right? or && !checkblacklist(remoteIP) @ProZachJ

ProZachJ commented 10 years ago

@matt just depends on how you write the function, but yea probably makes sense to make it return true if its in there and use the ! operator.

mattjay commented 10 years ago

@ProZachJ Check out the checkblacklist function i wrote. if we think this looks good it might be time to plug it all in and test it out e2e.

DarkShield / daProxy

Implement blocking #61