aerogear / proposals

AeroGear Proposals
Apache License 2.0
0 stars 17 forks source link

Mobile App Metrics and SDK version tracking proposal (2nd one) #22

Closed aliok closed 6 years ago

aliok commented 6 years ago
aliok commented 6 years ago

cc @david-martin @pb82 @darahayes @wtrocki quickly prepared the proposal PR. please have a look so that we can merge before Wed.

aliok commented 6 years ago

Some questions:

david-martin commented 6 years ago

We provision a new Grafana instance here. Is it possible to reuse the existing Grafana instance, if it is provisioned by backend metrics service? Which actually makes me wonder if having 3 separate services would make sense: visualization, backend metrics, mobile app metrics. I see we figured out dashboard discovery here: https://issues.jboss.org/browse/AEROGEAR-1886. @darahayes any comments

I think to a single APB that has the server metrics, mobile app metrics and visualisation is OK for an MVP. Grafana can be shared then. And Promethues can store data in Postgres.

What language/platform to implement the service? @david-martin

What was used during your investigation? Something relatively lightweight would be preferable. Having a flexible Postgres library available in the language would be important (to allow storing both time series and structured data). Language choice should also take into account downstream productisation concerns, so an already widely used language is preferable e.g. Golang, Node.js

wtrocki commented 6 years ago

@david-martin +100 for golang implementation

aliok commented 6 years ago

I think to a single APB that has the server metrics, mobile app metrics and visualisation is OK for an MVP. Grafana can be shared then. And Promethues can store data in Postgres

Ok, sounds good to me. We can talk more about it during planning phase.

What was used during your investigation? Something relatively lightweight would be preferable. Having a flexible Postgres library available in the language would be important (to allow storing both time series and structured data). Language choice should also take into account downstream productisation concerns, so an already widely used language is preferable e.g. Golang, Node.js

I used Node. Golang is also fine. Has the required libs.

aliok commented 6 years ago

Just a note for future reference:

There is a bad thing about using TimescaleDB/Postgres for metrics: flexibility. In ElasticSearch, we can just push JSON without defining anything on the storage system. But in Postgres, for example, if we want to push a new metric called "osVersion", we need to update the SQL table schema (ALTER TABLE bla).

I think this is OK for default metrics (appVersion, sdkVersion, etc.) that we're interested in this epic, but in the future when we want to provide user-defined metrics to our users, we can't just go and create/alter tables.

In case of using Prometheus (architecture#4), Prometheus was creating/altering tables for us. But they have created a super generic non-normalized table structure that can hold anything even though there's a performance cost. Maybe we can have a look how they structured tables (I think there was only 4 tables created by Prometheus: labels, values, etc.) later when we're working with custom metrics.

So, summary:

darahayes commented 6 years ago

@aliok You've done a lot more research here but would it be possible to do something like having a Postgres/Timescale table with a json/jsonb column? e.g:

CREATE TABLE usermetrics (
      time timestamptz,
      data jsonb
);

now we could insert like:

INSERT INTO METRICS VALUES (NOW(), {"sdkVersion": "1.1", "osVersion": 7, ...,}

Or perhaps we could do something generic like this:

CREATE TABLE usermetrics (
      time timestamptz,
      key varchar,
      val varchar,
);

and we could query for a particular metric like this:

SELECT time, val FROM usermetrics WHERE key='sdkVersion';

obviously the examples here are trivial but you should get the point. I think option 1 gives lots of flexibility for inserts whereas option 2 makes it very easy to query. Would one of these approaches be feasible? Food for thought.

aliok commented 6 years ago

@darahayes Haven't really experimented with "jsonb" column but I think it would bring some limitations in terms of indexing. Something we can have a look in the future though.

2nd idea seems nicer to me. But, homogeneous data is nicer when it comes to performance and it is best practice. Of course, we can sacrifice some things for flexibility. We can have a look at how Prometheus is storing data in TimescaleDB. I think it would give us a good idea. Anyway, this is low priority for now IMO.

darahayes commented 6 years ago

@aliok I have read that one advantage of the jsonb type is that you can do indexes. (tradeoff however is that inserts are quite slow)

Regardless, I agree it's low priority for now and we should go with the simplest approach for MVP. It's just good to know we have options later on!

aliok commented 6 years ago

@darahayes I was trying to keep the scope to default app metrics (app version, sdk version, etc.) for this proposal and it made sense to a structured table for those metrics.

But I noticed IDM self defence metrics is coming soon and we will have more metrics. I am now thinking instead of creating a structured table, we can use semi-structured for everything. This unifies the approach for metrics (default app metrics, IDM metrics, custom metrics, etc.). I am gonna spike around that. The reason that I have noticed this problem very late is because we always had semi-structured tables/indices in previous approaches (ElasticSearch or Posgres over Prometheus).

wtrocki commented 6 years ago

@aliok - I kinda feel that we will need to provide abstraction for metrics on the server and just assign data engines to map to actual storage system. Data engines may be something we will need to code to provide flexibility and adjust this over the time so we will not be tied to actual format of the data etc.

From SDK point of view we will have metrics containers (collectables) that will encapsulate specific fields. On server this abstraction need to still exist (see my suggestion above) and map to specific storage requirement and presentation requirements (grafana dashboards). Having it this way we could document entire metrics extension process and make future contributions really easy.

aliok commented 6 years ago

@darahayes @wtrocki I've checked how Prometheus done it. They use JSONB :)

So, I have created another module in the POC to do stuff with JSONB: https://github.com/aliok/mobile-analytics-poc/pull/6

This way, we only have one table with 3 fields in it : clientId, timestamp, data. Field data is JSONB. This unifies the default metrics with custom metrics. In the POC, I have used the same table for sdkVersion metrics and a custom "button click" metrics.

I haven't really checked the indexing techniques though. As @darahayes pointed out, there are ways to index fields inside JSONB data.

@wtrocki I totally agree on what you are saying. We should do that!

aliok commented 6 years ago

@david-martin @wtrocki @pb82 @darahayes Made the changes in the proposal we talked today. Do you mind giving another review to it?

david-martin commented 6 years ago

@aliok Good to merge if there's another +1/approval.