Closed sebastian closed 7 years ago
Let's put this one on hold for a little while, @sasa1977. I think we have other more pressing issues first, to get ready to the point where we are happy to let people use our system, rather than adding more features.
I'll un-assign you for now. Hope you don't mind.
Not important. I'll close this for now.
In fact this has turned out to be more useful than I thought. When wanting to segment a user base it's incredibly convenient!
For example, take the Games and Players
database as an example. Say we wanted to write queries that only deal with the "winners", then we could count them like this:
SELECT count(*) FROM (
SELECT t1.player as uid, (wins - losses) as wins
FROM (
SELECT player, count(*) as wins
FROM games
WHERE outcome = 'W'
GROUP BY player
) t1 INNER JOIN (
SELECT player, count(*) as losses
FROM games
WHERE outcome = 'L'
GROUP BY player
) t2 ON t1.player = t2.player
WHERE wins >= 0 and wins < 100000000000
GROUP BY uid
) winners
Being able to save the segmentation
section as a view would be incredibly useful, and would allows us to write the query as:
SELECT count(*) FROM winners
We could furthermore then combine multiple views to segment even further:
SELECT count(*)
FROM
winners INNER JOIN men ON winners.uid = men.uid
INNER JOIN own_a_car ON men.uid = own_a_car.uid
We could:
cloak
's don't need to know about the views at allMaybe it's best to create an own subpage for views?
Views are not shared across users for the time being.
Some comments here:
when a query uses a view, we can replace the mention of the view with the stored SQL
We'll certainly do this, but I think the only place we can currently do it is in Cloak. If we want to do it in Air, we'll need to parse the query there.
cloak's don't need to know about the views at all
Given my previous comment, cloaks will need to know about views, because cloak is the only one who can resolve the view reference properly. Since views are tied to a particular user, it means that as the part of the query request, the air would have to send all user defined views to the cloak. Cloak would then need to compile them as well, and could use them as allowed table references.
prior to saving we should run the query (as a subquery to get the correct validations in the compiler) and check that it works
Running the query might take a long time. It would be better to expand the interface of the cloak so that air can ask it whether the given view definition is valid, without actually running the query.
All of this actually makes me wonder if we should consider a different approach. Currently, cloak accepts query strings. However, cloak was never intended to be used by humans. So an alternative would be to move lexer and parser to the air. I believe we have most of the necessary data in the air, which would allow us to properly compile the query there. The cloak would then accept a compiled query and execute it.
The story is not so simple however, since cloak would still need to verify some things. Whatever is forbidden because of privacy aspects, needs to be double checked by the cloak. Otherwise someone could circumvent air and abuse cloak to compromise privacy. A fairly simple solution would be to do another compiler pass in the cloak. In this pass, no transformation would happen, but the same set of checks would run, thus ensuring that the query is valid.
The benefits of this approach are:
$1
, $2
, ...). Any other special queries needed by the PostgreSQL interface could be recognized in Air, and handled there without going to the cloak. In particular, I refer to the querying of the pg_catalog
schema which is done from Tableau and other similar tools.The main downside is that this requires a solid amount of work. Another issue is that whenever some additional feature is added to the cloak (e.g. additional function, or a new keyword), we need to redeploy cloak as well. Consequently, this increases coupling between the cloak and the air. Air and all of its cloaks need to be on the same version, otherwise some strange effects might occur.
So the decision is not simple. It seems to me that the added coupling between air and cloak is a serious issue. Therefore, I'd still advise to keep query parsing/compiling in the cloak. However, we should keep this idea in the back of our mind. The proposed steps are then:
cloak's don't need to know about the views at all
Given my previous comment, cloaks will need to know about views, because cloak is the
only one who can resolve the view reference properly. Since views are tied to a particular
user, it means that as the part of the query request, the air would have to send all user
defined views to the cloak. Cloak would then need to compile them as well, and could use
them as allowed table references.
You are absolutely right. The cloak needs to inline the view, not air.
prior to saving we should run the query (as a subquery to get the correct validations in the compiler) and check that it works
Running the query might take a long time. It would be better to expand the interface of
the cloak so that air can ask it whether the given view definition is valid, without actually
running the query.
I did in fact just mean the validation part. I.e. query compilation, not running.
So the decision is not simple. It seems to me that the added coupling between air and cloak is a serious issue. Therefore, I'd still advise to keep query parsing/compiling in the
cloak. However, we should keep this idea in the back of our mind. The proposed steps are
then:
- sending all user views together with the query string
- resolving views in the cloak
- support view validation request-response message, so that air can ask the cloak to
validate the view before saving it
It certainly is not an easy tradeof. We would either have to require the air and cloak run with compatible versions, or we would have to introduce a strict versioning scheme in the protocol, which would then require that the air is able to provide queries for older cloak’s too. Neither is nice.
I approve of your current plan of action (as listed above) as the right option for now.
Thanks, Sebastian
add a button in the query interface allowing a query to be saved as a view
There is one subtle issue here. Queries written by analysts usually won't contain uid
. However, our view needs to select the uid
column. We can implicitly add this column during the compilation. However, it won't be clear to the analyst what is the name of the uid column.
One solution for this is to also include all views in SHOW TABLES
and support SHOW COLUMNS
for views. For example, let's say that the analyst issues a query SELECT name FROM users
, and then saves this query as the view names
. Invoking SHOW COLUMNS FROM names
would return two columns: uid
, and name
. This seems like a fairly consistent user experience. What do you think?
It needs to be made very clear that the semantics of views are the exact same as those of subqueries. Hence a uid
-column has to be explicitly selected.
Since a view is based on a real table, the analyst knows which column is the UID column that needs to be selected. Hence we know it for views too. And as a result we can also make views based on views!
I would very much like views to appear when SHOW tables
and SHOW columns FROM <view>
are issued, but more than that I would also like them to be shown in the side-panel amongst the tables.
Views in the Aircloak sense or not materialised and do not at all live in the database.
We allow analysts to pre-construct select statements (with the same semantic as any sub-query we would allow), and save them as named views. These views are per user, can be referenced like any other table would be, and inserted into the query at runtime.
For example, the following
AQL
could be saved ascritical_patients
:and then used in another query like this:
Depends on the native subquery support issue.
https://trello.com/c/oUdK0l5K/7159-support-for-view-s