endlessm / azafea

Service to track device activations and usage metrics
Mozilla Public License 2.0
10 stars 2 forks source link

Create materialized view listing machine ids grouped by day #116

Closed liZe closed 4 years ago

liZe commented 4 years ago

This view is useful to get active user reports.

We have to take a decision about the way the view is refreshed: an SQL command has to be sent regularly, probably once a day. @adarnimrod Do you have an idea?

https://phabricator.endlessm.com/T29216

liZe commented 4 years ago

If we want to keep the materialized view, I can also add a command to refresh them.

liZe commented 4 years ago

Tests randomly fail, because the initdb command sometimes creates the materialized view after the parent table. I’ve added an explicit dependency relationship, but it’s not enough.

The table doesn’t appear in the metadata’s sorted tables list, because it’s not directly registered in the metadata. It’s registered in a temporary metadata object to avoid the real table creation. Listeners are then added to execute the CREATE MATERIALIZED VIEW statement.

The problem may also come from the way tables for different event processors are registered.

I have to find another solution, but I think that I’m not that far from a clean configuration for this view (and the next ones!)

(PS: Original article about materialized views with SQLAlchemy)

liZe commented 4 years ago

It’s fixed. Take that, SQLAlchemy! :boom:

The problem may also come from the way tables for different event processors are registered.

The tables were cleared at the end of each tests, but not the events on which views are based.

I’ll also add a command refreshing materialized views.

liZe commented 4 years ago

This pull request is ready for review.

The original goal of this pull request was to solve T29216 by creating a fast way to get machine ids grouped by days.

A nice side effect is that we now have a nice background to create more materialized views:

Even if the main goal is to provide SQL views for third-party tools, these views can also be used as any other tables by azafea.

New views could of course be useful to fix other slow requests, but they could also be a transitional solution for better anonymization. After we decide what to do in T29292, we can create views to aggregate the data we don’t want to keep anymore. After a period of tests and reports from users, we can decide to replace views by real tables, keep the old table to store temporary data, and build requests that regularly aggregate and remove the old table data, and store them in the new one.

This PR is not perfect and could be improved (if needed) on various points, including:

SQLAlchemy is not designed to work with views out-of-the-box, but it provides the needed APIs to handle them pretty well. We’ll have smaller and better code when they are fully supported.

adarnimrod commented 4 years ago

@wjt are there any other concerns or can we proceed with this PR?