gathering / gondul

Network management/monitoring system specialized for temporary events
http://tech.gathering.org
GNU General Public License v2.0
42 stars 10 forks source link
dhcp monitoring pinger snmp templating varnish

Gondul - The network monitoring/management system

This is the system used to monitor the network during The Gathering (a computer party with between 5000 and 10000 active clients - see http://gathering.org). It is now provided as a stand-alone application with the goal of being usable to any number of computer parties and events of similar nature. First up of non-TG users was Digitality X 2016 (http://digitalityx.no), taking place in June / July 2016.

Unlike other NMS's, Gondul is not designed to run perpetually, but for a limited time and needs to be effective with minimal infrastructure in place as it is used during initial installation of the network.

You should be able to install this on your own for other similar events of various scales. The system requirements are minimal, but some advise:

Some facts from The Gathering 2016:

Name

The name comes from the Norse Valkyrie Gondul, also known as the wand bearer.

Features

Some of Gondul's features are:

Current state

Gondul is used at The Gathering and Digitality X among other places. It was spun off as a separate project from the big "Tech:Server misc tools" git repository in 2015. It was also used extensively at The Gathering 2017.

There is no "release" process for the time being since all development is directly linked to upcoming events and development continues throughout events.

The current state of deployment is that it is in the middle of a re-design. As such, the current documentation is slightly out-of-date.

Installation

See INSTALLING.rst.

Architecture

Gondul is split in multiple roles, at the very core is the database server (postgresql).

The data is provided by three individual data collectors. They are found in collectors/. Two of these can run on any host with database access. The third, the dhcptailer, need to run on your dhcp server, or some server with access to the DHCP log. It is picky about log formating (patches welcome).

All three of these collectors provide systemd service-files which should keep them running even if they fall over. Which they might do if you fiddle with the database.

In addition to the collectors, there is the API. The API provides three different sets of endpoints. Two of these are considered moderately sensitive (e.g.: provides management information and port-specific statistics), while the third is considered public. The two private API end points are split into a read-only and write-only name space.

Last is the frontend. This is written entirely in HTML and JavaScript and interacts with the API. It comes in two minimally different versions: one public and one "private". The only actual difference should be what they try to access.

The basic philosophy of Gondul is to have a generic and solid API, a data base model that is somewhat agnostic to what we collect (so we can add more interesting SNMP communities on the fly) and a front end that does a lot of magic.

Recently, graphite/grafana was added, but as it failed to deliver during The Gathering 2017, the integration is being re-worked slightly. It is currently non-functional.

APIs

See doc/API.rst__.

__ https://github.com/tech-server/gondul/blob/master/doc/API.rst

On the topic of the front-end....

The front end uses bootstrap and jquery, but not really all that extensively.

The basic idea is to push a ton of information to the front-end and exploit modern concepts such as "8MB of data is essentially nothing" and "your browser actually does client-side caching sensibly" and "it's easier to develop js than adapt a backend when the need arises". If you look in a developer console, you will see frequent requests, but if you look closer, they should almost all be client side cache hits. And those which aren't can either be 304 Not Modified's or server-side cache hits. Caching is absolutely crucial to the entire process.

We need more user-documentation though.

Also, the front-end can be somewhat bandwidth intensive. Use gzip. Patches for variable polling frequency on mobile devices are welcome.

Security

Security is ensured in multiple ways. First of all, database passwords should obviously be kept secret. It is never visible in the frontend.

Secondly, APIs are clearly separated. Some data is actually duplicated because it has to be available both in a public API in an aggregated form, and in detailed form in the private API.

Gondul it self does not implement any actual authentication mechanisms for the API. That is left up to the web server. An example Apache configuration file is provided and the default ansible recipies use them.