allegro / turnilo

Business intelligence, data exploration and visualization web application for Druid, formerly known as Swiv and Pivot
https://allegro.github.io/turnilo/
Apache License 2.0
730 stars 174 forks source link

[Proposal] Add collection support for maintaining stored reports #409

Open a2l007 opened 5 years ago

a2l007 commented 5 years ago

One of the important features Turnilo is currently missing is the ability to save/view custom reports. This can also be referred as Collections using which users can save a custom query report and later view the report directly instead of building the query again. Our team would be interested to work on this feature and contribute it once it is ready. This issue is to track the progress of the development of this feature. FYI: @adrianmroz @mkuthan

mkuthan commented 5 years ago

@a2l007 - a few questions from my side:

Please also look at #72, favourites could be used to mark frequently used datacubes (or specific visualization). But as long as favourites are user specific they could not be easily shared (when the favourites would be stored in browser local storage).

adrianmroz commented 5 years ago

In my opinion turnilo should be focused tool and it's main use is exploratory data analysis. Adding managed collections of reports will complicate tool, code and devops side.

I think making internal repositories is bad idea, right now reports are identified by its url and could be stored anywhere, probably in places better suited for this like knowledge bases in company.

a2l007 commented 5 years ago

@mkuthan The first step would be to add an ability to add store settings into a database. With the support of a database, the state of the reports can be saved and retrieved as required. Adding authentication/authorization would be nice to have but is out of scope for this feature. So basically every user with access to the Turnilo instance would be able to view the reports.

@adrianmroz I agree that reports can be currently identified via a URL. One of the benefits of having custom reports could be to save a dashboard similar to this: Screen Shot 2019-05-07 at 4 53 26 PM Would you be open towards this? Our company is considering about migrating to Turnilo and dashboarding is one feature we would be interested in contributing.

mkuthan commented 5 years ago

@a2l007

Please keep in mind that database will increase operational complexity, e.g:

For me dashboard editor without security will end up with huge mess. Everyone will be able to define any dashboard with any name, or delete dashboard created by someone else (even by mistake).

Did you consider using Apache Superset for dashboards? It works really well for data explanation purposes in Allegro.

a2l007 commented 5 years ago

@mkuthan Thank you for your response. I've re-evaluated the priorities and following are my thoughts:

adrianmroz commented 5 years ago

About second point, I have some thoughts - anything close to roadmap, but something I keep in the back of my head:

I would like to give user ability to construct more complex measures (expressions). We started by adding arithmetic operations in 1.15 and percent of total/parent. We have few more ideas. We'd like to make it easy in UI on CubeView and persist that in url. So you can play with arbitrary expressions yourself, and share specific report when you find something.

Secondly, something we neglect because of our workflow - it could be great if we could streamline introspections on Druid clusters. So you don't need config file at all - just broker url. So just adding column to Druid would result in creating appropriate dimension/measure.

Of course, all of that would result in "configuration in vacuum". Expressions won't have names and columns won't have user friendly descriptions. That's a problem.

And last thing. I know it sounds like cop out, but maybe configuration editor should be separate app? "Admin panel for Turnilo". To iterate fast, it could be some simple cms that would read config.yml, save to the same place after edits and trigger turnilo restart. Just to test and measure interest in such tool. Integrating with auth/security services would be easier. Also wiring it with CI/CD systems.

a2l007 commented 5 years ago

@adrianmroz I have not used 1.15 yet but the ability to construct more complex measures sounds interesting. But how are the new complex measures persisted? From your comment, I assume the user would have to save the URL somewhere if they need to re-use the complex measure. I was thinking of something like this interface:

Screen Shot 2019-05-09 at 10 29 29 AM The user can add complex measures which can be saved back to the config yaml or database (depending on configuration). Wouldn't streamlining introspection as per your approach be more complex and more error-prone to the users? The users would now have to remember the datasource name, dimension name and metric name they want on the report, so that they can build the report URL appropriately. As an end user I would rather prefer adding measures via UI than modifying URLs. Regarding the idea of an Admin Panel, would the end user have access to add additional settings or would it be restricted to the admin?

adrianmroz commented 5 years ago

But how are the new complex measures persisted? From your comment, I assume the user would have to save the URL somewhere if they need to re-use the complex measure.

Yeah, they're ephemeral. You can't name them and reuse with single click. For now (we're in beta!) we find it as a nice tradeoff for exploratory data analysis.

I was thinking of something like this interface:

I understand. It's a strawman, but in Pivot you see and edit measures as plywood expressions. They're by no means simple.

Wouldn't streamlining introspection as per your approach be more complex and more error-prone to the users? The users would now have to remember the datasource name, dimension name and metric name they want on the report, so that they can build the report URL appropriately.

I mean that when datasource changes (admin added new column because data ingestion team added new artifacts) turnilo shows in left panel new dimensions/measures that you can use. Admin is concerned only with Druid, not turnilo. Of course there are problems with naming - if Druid column has bad name, that name will show in UI which sucks. And we have no metadata to show to the user.

I would like to do this regardless of administration features. So you could drop turnilo beside your Druid cluster (which maybe you use for superset or grafana) and start playing around.

Regarding the idea of an Admin Panel, would the end user have access to add additional settings or would it be restricted to the admin?

That is the decision for team/company. I see it as three level hierarchy. First, you have "operational" admin, who knows ins and outs of Druid and/or know how to run turnilo in environment. At this level you can easily modify config.yml (that's what we do in Allegro). Second, you have "business" admin. Knows what data is in Druid and he would benefit from such panel. He could pick columns to show in turnilo, define common, more complicated measures and add meaningful metadata for dimensions/measures. And last, regular user - I don't think he need to add/edit measures.

I know it could be tempting to integrate that into turnilo. but I would rather keep it separate. Adopt that sweet "microservice" lifestyle :) At start it will cost users to have two tabs open, but would enable developers to move faster and we could evolve turnilo in isolation.

a2l007 commented 5 years ago

we find it as a nice tradeoff for exploratory data analysis.

Could you please explain why would you need a tradeoff for exploratory data analysis?

It looks like the only compromise I'm getting here would be to build a separate app for UI configuration. Although I would like it to be integrated with Turnilo :) To give some context, we have a large user base currently on Pivot and since we're migrating to Turnilo, we're trying to minimize the user impact as much as possible.

adrianmroz commented 5 years ago

Could you please explain why would you need a tradeoff for exploratory data analysis?

The disadvantage is that you can't easily reuse measure like "Users minus Banned users". To use it on next raport you need to fiddle with menu again. But the advantage is that list of measure is short and focused. You can try to create expressions on the fly. One person may need X + Y, other need Z / V, and often they need it only for one report. So that's the tradeoff - can't reuse easily but can create ad hoc and doesn't pollute list of measures.

It looks like the only compromise I'm getting here would be to build a separate app for UI configuration. Although I would like it to be integrated with Turnilo :) To give some context, we have a large user base currently on Pivot and since we're migrating to Turnilo, we're trying to minimize the user impact as much as possible.

We've been in the same spot and had long discussions about it. We have limited resources and we're very fond of stateless nature of turnilo (easy operations). For sure we can't work on such feature and we need to keep turnilo compatible. Separate app could be a better start and maybe we can integrate it later?

Btw, we have some plans for improvements in reading and validating config file in #365.

a2l007 commented 5 years ago

For sure we can't work on such feature and we need to keep turnilo compatible. Separate app could be a better start and maybe we can integrate it later?

Ok that makes sense I guess. We'll get started on creating a separate config app for now. I hope to get your code review on the changes once it is available.

alexbusu commented 5 years ago

I'm not sure it was mentioned or not, the complex measures (including custom aggregations) can be added actually to config file, and used in data cubes. Add there as much as you want, the performance will not hurt as long as one requires specific measure in query (not all of them). Here are some formulas we use in Turnilo config (plywood expressions, they work like charm!): image image

Also, maybe worth testing a Dashboard setup by configuring it in the same config file. It would need the name, and boxes list, each with a name maybe, grid position, dimensions and the url hash (of "simple" view) containing the request info. Just to test the idea. Let along customizations from UI, these are not so often, and can be performed in config file.

a2l007 commented 5 years ago

Thanks for the comment. I'll make sure to post updates once we have a working model ready.

mkuthan commented 5 years ago

@a2l007 - please look at the comment about Allegro internal workflow for Turnilo configuration. We do not have a graphical tool but at least the configuration is de-centralized and everybody is able to configure datacubes: https://github.com/allegro/turnilo/issues/96#issuecomment-379768772.

We also considered direct support for multiple configuration files in Turnilo but the idea was abandoned because the trick with configuration git repository + simple "merge & deploy" CI plan works flawlessly for us.

a2l007 commented 5 years ago

@mkuthan Thank you for your input and I understand how Turnilo fits into Allegro's internal workflow. Since Turnilo is open source now, there are several external Turnilo users and all of their requirements may not fit into Allegro's internal use cases. As I have said before, the changes for this proposal would not impact your workflow because the feature can be enabled or disabled using a config property. This config property would be disabled by default so that the changes would have zero impact to your workflow. So if we define a property called enableConfigEditor and set it to false, the edit option would be disabled and there would be no need to make additional changes on your end.