Closed adulbrich closed 3 years ago
On the collection of data to later compute the stats
An alternative to using InfluxDB would be to use elasticsearch as a time-series database:
An article that compares InfluxDB to another time-series: https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/ It is written by a competitor but they have valid points on InfluxDB.
Besides the deployment usage stats, this data could be used to power and replace the current implementation of ProjectCounts
and StorageStatistics
by making some calls to ES to compute the stats.
Visualization in Grafana
Copy/paste from https://grafana.github.io/grafonnet-lib/:
A dashboard in Grafana is represented by a JSON object.
While this choice makes sense from a technical point of view, people who want to keep those dashboards under version control end up putting large, independent JSON files under source control.
When doing so, it is hard to maintain the same links, templates, or even annotation between graphs. It usually requires a lot of custom tooling to change and keep those Json files aligned.
There are alternatives, like grafanalib, that makes thing easier. However, as Grafonnet is using Jsonnet, a superset of JSON, it gives you out of the box a very easy way to use any feature of grafana that would not be covered by Grafonnet already.
I never used it but I do know that maintaining these large json objects in git is a real pain.
On the Elasticsearch side are you suggesting something like this ? I'm not sure if we need the ILM bit though...
Seems doable and probably easier than having influxDB, but it would require some work
Without ILM and even most of the things that are in the page you linked as we would not use Kibana either.
The main difference with the views is that we would index the events and not the resources.
We could use the rollover api to write to a new index when the current one reaches a certain size but it is something we could skip too: https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html
In which way, do you think it would require more work than InfluxDB ?
The idea would be:
ProjectCounts
and StorageStatistics
so as not to compute the stats but to push relevant information for each event to ESI agree the in memory caches implementation (ProjectsCounts
and StoragesStatistics
) computed using fs2 streams can be replaced with ES and at the same time we can draw stats from it.
If we store some of the basic information for each event (instant, project, deprecation status, event type, resource @type)
in an index per project, we could answer the following questions:
However we would still not be able to answer the following question:
distribution
field on the source payload). In order to achieve that we would have to do a bit of graph navigation / expanded Json-LD cursor navigation)There are few disadvantages though:
ProjectCounts
and StoragesStatistics
can be much more expensive / complex than a query to a cache. I'm not sure though what exact latency we would be expecting.For the project deletion and file size distribution, the question remains the same no matter if it is ES/InfluxDB but yeah these ones are the toughest ones.
For the latency, there will not be complex aggregations (nothing nested for example) and we can ask for only the one we are interested. And if we hit only one shard and expect ES cache to do its job, it should remain low.
On the Delta side, the implementation could look like this:
sealed trait Action {
case object Create
case object Update
case object Deprecate
case object Tagged
case object Deleted
}
final case class EventMetric(instant: Instant,
subject: Subject,
action: Action,
project: ProjectRef,
organization: Label,
id: Iri,
types: Set[Iri],
additionalFields: JsonObject) extends Metric
Where additionalFields would allow to hold specific to a type of resource like the size of a file or the size of a distribution.
A new method in EventExchange
would allow to get the metric for an event:
def toMetric(event: Event): UIO[Option[EventMetric]]
The UIO is here at least for files for which we need to fetch the file to get the storage id as it is not present for every kind of event.
A stream would run on the project events, get the metric from the event and push to a single index that would store the metrics.
This index would be then queried to provide project and storage statistics for Delta and for the dashboards.
I tested on my laptop with around 10M events for ~9000 projects, the dashboards were quite reactive (around 2s for the most expensive one which was the sum of file size per project)
On the dashboard side:
It is easier to create dashboards with with the UI in Kibana (autocompletion for the field values helps a lot) than with Grafana (with or without grafonnet).
With grafonnet, it was even more difficult to get to a result so I think we can forget it. It is not ideal to have giant unreadable json blobs in git but it is even less ideal to create a dashboard in a week by somebody that know the jsonnet language when with Kibana, you can do it in half a day (or a day for someone that does not know Kibana)
Grafana: Pros:
Cons:
Kibana: Pros:
Cons:
To have an idea of how grafana looks, look at the instance in production or watch screenshots here: https://grafana.com/grafana/
A dashboard I created for kibana (the data I generated is too uniform to have interesting charts but it gives an idea) :
A demo screenshot of a dataset included with Kibana:
Action
shouldn't be exposed like that, since every plugin can potentially have any "actions" (commands and events are not tight to Create/Update/Tag/Deprecate. Files, for example have events to do with attributes, which has nothing to do with create/update/...
Powering ProjectsCounts
with ES would mean that we will have remove the ProjectsCounts
index of that project when issuing a delete of the project.
I just point it out here as something to take into account.
Another thing to be considered:
For ViewStatistics
we usually assumed that the projectsStatistics (retrieved from ProjectsCounts
) were always ahead of the actual view stream counts (specially for compositeviews). That might not be the case anymore since the projects have to be indexed and made available in ES.
This assumption was kind of difficult to make anyway no ? The streams are independent and don't work on the same events, everything is eventually consistent, ...
Well, the assumption was that reading one stream and just adding counts to it would be faster than reading another stream + do json-ld conversions + index things into es.
It was not strictly guaranteed that one would finish before the other but in practice it does.
I agree with you on this point.
But streams are mostly idling and who wins also depends on when each stream made its last poll
Related to #2528