diffix / explorer

Tool to automatically explore and generate stats on data anonymized using Diffix
MIT License
2 stars 1 forks source link

Unify schemas for aggregation metrics #215

Closed gampleman closed 4 years ago

gampleman commented 4 years ago

There are a number of metrics that give back some list of things and along those produce some metadata (how many rows, how many suppressed, how many null, etc). However, these are organised inconsistenly.

I'd like to propose that these would be organised as a single metric, with a values field and metadata field, like this:

{
  "name": "histogram",
  "value": {
    "values": [
      {
        "bucketSize": 5.0,
        "count": 7,
        "countNoise": 1.8,
        "lowerBound": 10
      },
      {
        "bucketSize": 5.0,
        "count": 9,
        "countNoise": 1.8,
        "lowerBound": 25
      },
      {
        "bucketSize": 5.0,
        "count": 5,
        "countNoise": 1.8,
        "lowerBound": 30
      }
    ],
    "metadata": {
      "nonSuppressedCount": 21,
      "nonSuppressedNonNullCount": 21,
      "nonSuppressedRows": 3,
      "nullCount": 0,
      "nullRows": 0,
      "suppressedCount": 4,
      "suppressedCountRatio": 0.16,
      "suppressedRowRatio": 1,
      "suppressedRows": 1,
      "totalCount": 25,
      "totalRows": 4
    }
  }
}

The same format could apply to dates_linear.*, distinct and text.length.

AndreiBozantan commented 4 years ago

PR #312 is changing things in this direction. (At this moment the plan is to not merge this PR, so this comment is just for future reference.)