BlueBrain / nexus

Blue Brain Nexus - A knowledge graph for data-driven science
https://bluebrainnexus.io/
Apache License 2.0
280 stars 74 forks source link

Design proposal for the recording and presentation of deployment usage stats #2683

Closed adulbrich closed 3 years ago

adulbrich commented 3 years ago

Related to #2528

imsdu commented 3 years ago

On the collection of data to later compute the stats

An alternative to using InfluxDB would be to use elasticsearch as a time-series database:

An article that compares InfluxDB to another time-series: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/ It is written by a competitor but they have valid points on InfluxDB.

Besides the deployment usage stats, this data could be used to power and replace the current implementation of ProjectCounts and StorageStatistics by making some calls to ES to compute the stats.

Visualization in Grafana

Copy/paste from https://grafana.github.io/grafonnet-lib/:

A dashboard in Grafana is represented by a JSON object. 

While this choice makes sense from a technical point of view, people who want to keep those dashboards under version control end up putting large, independent JSON files under source control.

When doing so, it is hard to maintain the same links, templates, or even annotation between graphs. It usually requires a lot of custom tooling to change and keep those Json files aligned. 

There are alternatives, like grafanalib, that makes thing easier. However, as Grafonnet is using Jsonnet, a superset of JSON, it gives you out of the box a very easy way to use any feature of grafana that would not be covered by Grafonnet already.

I never used it but I do know that maintaining these large json objects in git is a real pain.

umbreak commented 3 years ago

On the Elasticsearch side are you suggesting something like this ? I'm not sure if we need the ILM bit though...

Seems doable and probably easier than having influxDB, but it would require some work

imsdu commented 3 years ago

Without ILM and even most of the things that are in the page you linked as we would not use Kibana either.

The main difference with the views is that we would index the events and not the resources.

We could use the rollover api to write to a new index when the current one reaches a certain size but it is something we could skip too: https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html

In which way, do you think it would require more work than InfluxDB ?

The idea would be:

umbreak commented 3 years ago

I agree the in memory caches implementation (ProjectsCounts and StoragesStatistics) computed using fs2 streams can be replaced with ES and at the same time we can draw stats from it. If we store some of the basic information for each event (instant, project, deprecation status, event type, resource @type) in an index per project, we could answer the following questions:

However we would still not be able to answer the following question:

There are few disadvantages though:

imsdu commented 3 years ago

For the project deletion and file size distribution, the question remains the same no matter if it is ES/InfluxDB but yeah these ones are the toughest ones.

For the latency, there will not be complex aggregations (nothing nested for example) and we can ask for only the one we are interested. And if we hit only one shard and expect ES cache to do its job, it should remain low.

imsdu commented 3 years ago

On the Delta side, the implementation could look like this:

sealed trait Action {
    case object Create
    case object Update
    case object Deprecate
    case object Tagged
    case object Deleted
  }

  final case class EventMetric(instant: Instant,
                               subject: Subject,
                               action: Action,
                               project: ProjectRef,
                               organization: Label,
                               id: Iri,
                               types: Set[Iri],
                               additionalFields: JsonObject) extends Metric

Where additionalFields would allow to hold specific to a type of resource like the size of a file or the size of a distribution.

A new method in EventExchange would allow to get the metric for an event:

def toMetric(event: Event): UIO[Option[EventMetric]]

The UIO is here at least for files for which we need to fetch the file to get the storage id as it is not present for every kind of event.

A stream would run on the project events, get the metric from the event and push to a single index that would store the metrics.

This index would be then queried to provide project and storage statistics for Delta and for the dashboards.

I tested on my laptop with around 10M events for ~9000 projects, the dashboards were quite reactive (around 2s for the most expensive one which was the sum of file size per project)

On the dashboard side:

It is easier to create dashboards with with the UI in Kibana (autocompletion for the field values helps a lot) than with Grafana (with or without grafonnet).

With grafonnet, it was even more difficult to get to a result so I think we can forget it. It is not ideal to have giant unreadable json blobs in git but it is even less ideal to create a dashboard in a week by somebody that know the jsonnet language when with Kibana, you can do it in half a day (or a day for someone that does not know Kibana)

Grafana: Pros:

Cons:

Kibana: Pros:

Cons:

To have an idea of how grafana looks, look at the instance in production or watch screenshots here: https://grafana.com/grafana/

A dashboard I created for kibana (the data I generated is too uniform to have interesting charts but it gives an idea) : Screenshot 2021-08-23 at 17-04-19 Test - Elastic

A demo screenshot of a dataset included with Kibana: Screenshot 2021-08-23 at 17-05-27  eCommerce  Revenue Dashboard - Elastic

umbreak commented 3 years ago

Action shouldn't be exposed like that, since every plugin can potentially have any "actions" (commands and events are not tight to Create/Update/Tag/Deprecate. Files, for example have events to do with attributes, which has nothing to do with create/update/...

umbreak commented 3 years ago

Powering ProjectsCounts with ES would mean that we will have remove the ProjectsCounts index of that project when issuing a delete of the project.

I just point it out here as something to take into account.

umbreak commented 3 years ago

Another thing to be considered:

For ViewStatistics we usually assumed that the projectsStatistics (retrieved from ProjectsCounts) were always ahead of the actual view stream counts (specially for compositeviews). That might not be the case anymore since the projects have to be indexed and made available in ES.

imsdu commented 3 years ago

This assumption was kind of difficult to make anyway no ? The streams are independent and don't work on the same events, everything is eventually consistent, ...

umbreak commented 3 years ago

Well, the assumption was that reading one stream and just adding counts to it would be faster than reading another stream + do json-ld conversions + index things into es.

It was not strictly guaranteed that one would finish before the other but in practice it does.

imsdu commented 3 years ago

I agree with you on this point.

But streams are mostly idling and who wins also depends on when each stream made its last poll