Alan-R / rust-nanoprobe

An implementation of the Assimilation project nanoprobe in Rust
2 stars 0 forks source link

What does the Assimilation Project do?

The Assimilation project creates a continually updated and very detailed map (CMDB) of system and network configuration in a way that scales better than any other known system. This information is sufficiently detailed to determine if systems are misconfigured, contain hacked software, or are configured in violation of best practices (security or other). To the degree possible, this is performed with near-zero manual configuration, as manual configuration is eventually incorrect (often sooner than later).

What are the downsides of the Assimilation Project?

The Assimilation project requires active agents on most programmable network endpoints (i.e., servers). Although this requirement can be a show stopper for some organizations, without agents, scaling is typically problematic, often highly so. With the Assimilation architecture, even massive scaling can be achieved with minimal resources.

What are the architectural components of the Assimilation software?

Each Assimilation installation consists of a central Collective Management Authority (CMA) and a large number of active agents called nanoprobes. Nanoprobes act strictly under the direction of the CMA. This README primarily provides details about nanoprobes, and discussions of how the CMA works, and how it uses nanoprobes are outside the scope of this README.

What Is a nanoprobe?

A nanoprobe is an active agent that is widely distributed across the network to monitor liveness of systems in the network, perform discovery of the configuration of the system it is attached to, and monitorin its services. In an ideal world, every programmable endpoint would have a nanoprobe running on it. Liveness of systems is determined by the exchange of heartbeat packets. It is intended that nanoprobes be as simple as possible, and do little or nothing on their own.

How do nanoprobes work?

Nanoprobes, except during initialization, do only what they've been told to do by the CMA. Many of the actions required for discovery operations require high levels of privileges. Nanoprobes must ensure that they are operating at the lowest level of privilege necessary to perform the task at hand. Discovery data is potentially highly sensitive, and the level of privilege necessary means that security must be taken seriously from the ground up. The way to think of discovery data for an entire network, is that discovery data is not the buried treasure, but a map of where the buried treasure is. Such a map would be extremely valuable to an adversary.

During Initialization

During initialization, it sends out a single (reliable) packet to announce that it is now alive. Depending on local configuration this could be to a multicast address, or a unicast address. It also automatically does some basic and well-known discovery of the local OS configuration, I think???

The normal startup sequence for a nanoprobe is as follows:

  1. Nanprobe starts up (main program is activated)
  2. Nanprobe sends and "I've just started" message to the CMA. This might be to the reserved multicast address for the project, or to a unicast address (if one was configured). This message contains local OS version information, and the public key of this nanoprobe.
  3. The CMA replies to the nanoprobe (including its "true" unicast IP address), tells the nanoprobe the addresses of its heartbeat partners, what discovery actions it is to perform, and how often..
  4. The nanoprobe performs these actions, and sends the results to the CMA. Results of discovery actions are not retained by nanoprobes across restarts.

After Initialization

Once a nanoprobe is initialized, it does these things:

Discovery Actions

Discovery actions produce JSON describing the things that have been discovered. If a discovery action produces the same JSON as it did previously, the discovery data is discarded. On *nix systems including Macs, discovery actions are typically performed by shell scripts. Because of differences between *nix systems, the scripts may differ from platform to platform. On Windows systems, discovery actions are typically performed by PowerShell scripts. It is intended that regardless of OS environment, that certain types common of discovery actions (e.g., network topology) produce very similar JSON.

Capabilities required for nanoprobes

Virtualized environments

About communication

Food for Thought (mostly architectural level issues, mostly not nanoprobe issues)

Not all of these need to be solved soon, but need to be given good thought over time.

Why rewrite the nanoprobe - and why Rust?

The previous "C" code version of the nanoprobe worked well, and it was a reasonable design - so why rewrite it at all? The answer is that it never ran on Windows, and that building it to be portable and run on all Linux systems from a single binary was a horribly complicated kludge. Building separate version for each and every version of Linux was even worse. So going to a language like Rust or Go which can cross-compile to many environments would eliminate this complexity and build-system-fragility. In addition, few people want to work on C code, which isn't a good thing when it comes to looking for developers for open source projects.

The nanoprobe is designed to be a very low-profile and consume few resources, and run indefinitely without needing a restart. Here are the characteristics which I believe are necessary for such systems:

Of the modern languages, only Rust satisfies all these criteria. Some may wonder why garbage collection is on the prohibited list. The answer to that is that programs in garbage collected languages grow and grow in size until they are garbage collected. While they are growing, they kick the operational software out of memory for their own growth, impairing the system they are monitoring. Some may answer, just tell it how much memory they need, then they'll not kick everything out of memory. Such tuning is fragile, and will eventually be incorrect.

Other points in Rust's favor:

Approaches to writing this code in Rust

I'm just now learning Rust, but the old nanoprobe code in "C" is quite solid, and can serve as a good model, but none of it is yet written in Rust. Here are the different dimensions that I see I could as semi-independent development chunks: