allegro / turnilo

Business intelligence, data exploration and visualization web application for Druid, formerly known as Swiv and Pivot
https://allegro.github.io/turnilo/
Apache License 2.0
724 stars 169 forks source link

Better handling of Restricted datacubes #714

Open adrianmroz-allegro opened 3 years ago

adrianmroz-allegro commented 3 years ago

Right now Turnilo allows to hide some cubes from user based on x-turnilo-allowed-datacubes header. These datacubes are completely hidden and it is hard to manage access in organization.

Main point: restricted datacubes should still be sent to client

In both cases we could show description of datacube, where author should put information how to obtain access. Turnilo still doesn't have information about users, so access management is totally external.

Technical

Cluster.toClientSettings should get information about allowed datacubes and serialize data cube list based on it and setting guardDatacubes.

Hiding datacubes

This change would be breaking so we need your feedback. Especially @KarolakJakub @wmeler

If we still need to "hide" datacubes like in current version, we can change guardDataCubes property from boolean to object with properties: checkAccess - boolean if we should check access to data cube hideRestricted - boolean if we should hide restricted data cubes from user

Of course current true value would be mapped to object with both properties set to true for backward compatibility.

Removal of x-turnilo-allowed-datacubes header

We would like to remove this header from core turnilo codebase. We have turniloMetadata object on Request and it is better place for control values that could change turnilo behaviour.

Of course this change could be breaking but clear migration path need to be in place: Turnilo will provide plugin that can rewrite x-turnilo-allowed-datacubes header into turniloMetadata. Plugin would be easy to add and we need documentation for this.

Plugin could be configurable - what header should be rewritten.

Or maybe you can write your own plugin based on one provided. There you could easly read your own specific headers (whole Request object in fact) and manually fill turniloMetadata. Plugins are great place for your organization specific code!

l2dy commented 3 years ago

I think it should be made possible to both hide and deny access to datacubes in a plugin. We have a central place for documentation, so "how to obtain access" should not be scattered across every system we use.

adrianmroz commented 3 years ago

I think it should be made possible to both hide and deny access to datacubes in a plugin.

You mean you opt for both guard and hide props on cluster?

We have a central place for documentation, so "how to obtain access" should not be scattered across every system we use.

I don't think I follow. I imagine all restricted data cubes would have link in description explaining why cube is "disabled".

Please share your workflow if you can, so we can plan this task better. We're thinking about oauth integration in the future so more information would help us immensely.

l2dy commented 3 years ago

I think it should be made possible to both hide and deny access to datacubes in a plugin.

You mean you opt for both guard and hide props on cluster?

Yes, this would support more workflows.

We have a central place for documentation, so "how to obtain access" should not be scattered across every system we use.

I don't think I follow. I imagine all restricted data cubes would have link in description explaining why cube is "disabled".

Please share your workflow if you can, so we can plan this task better. We're thinking about oauth integration in the future so more information would help us immensely.

  1. We have about 50 datacubes on a single instance, and users are complaining that they see too many datacubes on the index, which makes it hard to find their own, so I requested #495, to keep datacubes organized, but hiding them from the index could also improve the situation.
  2. Some datacubes would have contained confidential information, but it was withdrawn because Turnilo could not control access to it. In this case, we would like to both hide and guard these datacubes.

We have two permission management systems, with radically different workflows. The first one is designed around a hierarchical tree, and people who have permissions of a node also get access to all of its child nodes. I would like to associate every datacube with a node, so permissions can be inferred from the tree and used to hide irrelevant datacubes. Most cubes here are not confidential, so it's not necessary to guard them. This is more close to the first workflow described above.

The second system is designed around flexible groups, anyone can create a group and group admins can add new members. In this case, I want to let users create new Turnilo instances themselves. Each instance is associated with a single group, and group admins may configure authentication for Druid clusters (#583) on their instances. This way we could host confidential datacubes on Turnilo, if both hide and guard props can be enabled by the admins. This approach aims for maximum flexibility and scalability, but it is much harder to implement.

adrianmroz commented 3 years ago

Thanks @l2dy for feedback! I try tackle #495 with this task. The code is in the same place so should be easy :)

One thing to note - checking access / guarding is superior rule. So you can have:

I will ready more closely your second part and discuss internally in upcoming weeks.

Right now we have broad idea that admin could write plugin that based on request headers could fill turniloMetadata object and drive access to data cubes. That way you could setup authorisation on load balancer / proxy.

For example in our case, we have authorisation before turnilo. Our plugin gets user's AD identity in header, translates it to x-turnilo-restricted-datacubes. And we also use query decorator to add filter to every query based on AD identity - so user see only data rows with specific values.

adrianmroz commented 3 years ago

If we still need to "hide" datacubes like in current version, we can change guardDataCubes property from boolean to object with properties: checkAccess - boolean if we should check access to data cube hideRestricted - boolean if we should hide restricted data cubes from user

Maybe we need two lists of data cubes instead of global setting? Plugin could provide list of "hidden" and "restricted" data cubes. Wouldn't that be more flexible? @mkuthan

wmeler commented 3 years ago

I like idea of having both checkAccess and hideRestricted configurable. Sometimes it would be nice to configure it from user perspective - I assume that idea with plugin rewritting x-turnilo-allowed-datacubes to turniloMetadata follow that pattern. Sometimes it would be nice to configure it on per datacube basis through property in config file. Now in our ogranisation we prefer to not show datacubes user isn't allowed access to.

mkuthan commented 3 years ago

Please remember about user experience when someone gets the link to the restricted data cube. Without access he/she gets 404 (not found) instead of 403 (unauthorised) because Turnilo client doesn't know why the data cube is missing.

adrianmroz-allegro commented 3 years ago

Let's organise our naming and patterns and figure out what we want.

Definitions:

Scenarios:

Home view

Cube view

After selecting cube from list it can show only available cube - trivial

After getting link to the cube

In Allegro we definitely want differentiation between non existing cube and Denied one. We don't care if we leak data cube names and "You don't have access" is better message than "not found".

What about you guys? Do you want to hide data cubes that user don't have access to? Or just show message? Should this behaviour be global or defined per data cube?

wmeler commented 3 years ago

404 for hidden datacubes is acceptable for me - you can find similar behaviour in gitlab.