Customisable aggregations

Right now (due to historical reasons), each aileen-core box uses one specific kind of aggregation: Count the distinct observables which were seen, per hour or per day.

We can allow much more, like max/mean/min value, and we can simply aggregate across all observables or add this grouping-by-observable step (which we do now).

We'll have to add a model class Aggregations, so the box knows what to do. For data quality, right now it seems that the server is the best place to configure them centrally, and boxes simply download them.

If there is more than one aggregation (e.g. the user wants min as well as max), the box uploads twice the amount of aggregations.

For the UI, the server will allow the user to select which aggregation needs to be shown. The API (see #9 ) will of course also reflect this.

Here is some initial sketch of how models can be adapted:

diff --git a/aileen/data/models.py b/aileen/data/models.py
index 638a846..5d265fa 100644
--- a/aileen/data/models.py
+++ b/aileen/data/models.py
@@ -119,6 +119,57 @@ class Events(models.Model):
         return Events.pdobjects.filter(box_id=box_id).to_dataframe()

+class Aggregations(models.Model):
+    """
+    This class can help making Aileen versatile.
+    For instance, the current case uses distinct observable grouping, and then sums those up.
+    This requires a few changes:
+    * Aileen boxes ask the server for aggregations on startup, and (re)creates them.
+      To keep them centrally is good for admin and synching, but might make it difficult
+      to operate a box when offline ... but aggregations will be tried again until possible.
+      If the server sends back an empty list, complain. Or the server could refuse to start.
+    * The Aileen box also needs to re/de-activate aggregations existing based on the server's current aggregation demands.
+    * SeenByHour and SeenByDay get an extra attribute "aggregation_id". 
+    * We might need a possibility that when an aggregation is added or re-activated, to back-compute these aggregations for available
+      data. I guess aggregate_data.py can be smart enough to detect which ones are missing. But also there
+      should be a limit (not more than a week back maybe).
+    * The frontend should get the possibility to select which aggregation is currently displayed in the KPIs
+      and charts. Same for the REST API we'll do later.
+    * Maybe stasis / seen earlier measures are only interesting to display when group_by_observable == dist
+    """
+
+    # box_id = models.CharField(max_length=256)
+    name = models.CharField(max_length=120, help_text="Technical name")
+    display_name = models.CharField(
+        max_length=120,
+        help_text="Name when this aggreagtion is displayed (what does it represent?).",
+    )
+    active = models.BooleanField(default=True)
+    group_by_observable = models.CharField(
+        max_length=4,
+        choices=[
+            ("no", "Do not group events"),
+            ("dist", "Distinct, group with value 1"),
+            ("sum", "Sum values"),
+            ("min", "Minimum value"),
+            ("mean", "Mean value"),
+            ("max", "Max value"),
+        ],
+        default="no",
+        help_text="Group events by observables before aggregating? Otherwise the operation takes place across events.",
+    )
+    operation = models.CharField(
+        max_length=4,
+        choices=[
+            ("sum", "Sum values"),
+            ("min", "Minimum value"),
+            ("mean", "Mean value"),
+            ("max", "Max value"),
+        ],
+        default="sum",
+    )
+
+
 class SeenByHour(models.Model):
     box_id = models.CharField(max_length=256)
     hour_start = models.DateTimeField()

aileenproject / aileen-core

Customisable aggregations #8