Graph redesign proposal

elasticmachine commented 6 years ago

Original comment by @markharwood:

This proposal is for a radical redesign of the existing Graph functionality based on the design principles established in Sculptor LINK REDACTED. A fully functional demo walkthrough is LINK REDACTED.

In a UI framework that adopts the main 3 principles from Sculptor, interoperability and therefore workflow is vastly improved and Graph can be re-imagined . The basic principles are: 1) A common object model is used to represent query objects (term, date range, bool etc) that can be nested and composed as ANDs ORs NOTs etc using a query builder. 2) All visualizations can produce query objects based on user selections which are then Ctrl-C copied or dragged 3) Some visualisations can consume query objects as the choice of subject matter e.g. timeline visualizations or Graph - consumption happens through Ctrl-V paste or drop operations.

In this framework a Graph UI becomes a blank canvas into which multiple query objects can be dropped to draw the lines between them. The query objects can be anything built from any other part of the UI e.g the bar chart items Shay Banon and kimchy might be assembled as an OR and dropped as a single boolean query node in the Graph.

There are several pros to this which address several cons.

Problems with relying on the existing Graph API

The server-side Graph API provides discovery of nodes but has proven to be an ineffective tool:

1) It has a hit or miss results which veer between discovering too little or too much based on the complex settings chosen and the overly-sensitive need to pick the right starting nodes in the workspace LINK REDACTED. Avoiding running into the same problem nodes again requires users to define blacklists. 2) Graphs are not exhaustive and may mislead users - the results shown fail to fill-in all the connections LINK REDACTED between nodes and users have to remember to click the link icon to try back-fill any missing connections. 3) Nodes can only be simple terms (a single field and value) rather than any query like Shay OR kimchy, #Hadoop OR #Cloudera or 10.0.3.10 AND [time range] which are required to overcome issues in real-world data. Entities of interest often have multiple IDs or change hands over time (IP addresses, phone numbers, cars, houses) 4) Lack of detail - line thickness driven by money amounts etc is not possible using the Graph API because: a) It doesn't offer support for child aggregations on nodes/edges b) The concepts in the UI (grouped nodes) depart from what the Graph API uses (simple terms only). 5) The api has the idea of a "guiding query" eg. a time or geo filter but it is unused because it is not clear how that filter would be exposed in the Graph UI alongside the query bar (example issue).

Perhaps a missing Graph API feature is path-finding (the shortest path between X and Y) but elasticsearch is not optimised for this task.

Advantages of Graph in the browser.

Control In a Sculptor-like framework Graph nodes can be formed with more precision - reusing a common QueryBuilder to clearly define context/scope of exploration and nodes can be reviewed and composed more readily
Drill-down detail The existing adjacency_matrix aggregation in open-source offers a rich back-end datasource to describe the temporal, geographic and financial summaries of all nodes and edges
Accuracy The graphs returned by adjacency_matrix are exhaustive so there are no missing connections. Showing all connections can introduce UI clutter but time, money or doc counts can be used to filter clutter.
Interoperability Sculptor's common object model for queries means discoveries in any visualization can be reused elsewhere without restriction. Graph UI need not rely on the limitations of the Graph API to discover nodes - it now has a wide source of nodes it can consume from any other current or future visualization and in turn, Graph can produce selections of nodes or groupings for use anywhere else. Pivoting like this is often a necessary part of data exploration and the Graph UI of today has a clunky LINK REDACTED of doing this using brittle drill-down URLs. It also lacks LINK REDACTED which would add another source of messy config.

Disadvantage of Graph in the browser

Server-side Graph API may become a relic
Packaging (technically and commercially) needs to be rethought
We probably have to lose the existing "plus" expand button in the UI - it was easy to use but perhaps too simplistic for adding content to the workspace.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

timroes commented 3 years ago

Closing this as outdated. We currently have no ongoing redesign efforts on Graph.

markharwood commented 3 years ago

@timroes Note that this issue was not exclusively about Graph. It was mainly about introducing the common object model for queries in Kibana. In the ongoing work relating to rethinking search/filtering we are only considering drag/drop organisation of these sort of query objects in the context of visual Boolean logic editing. What has not been widely acknowledged is that these query objects could/should be draggable and droppable into a wide range of visualizations (line charts, maps, graph ...) as the user's chosen subjects of interest. If users cannot drop their explicit choices of subject matter into a visualization then they are typically left with a playing a limited form of lucky dip - seeing what the visualization throws up as the top 10 results from a choice of field.

timroes commented 3 years ago

Would you mind adding the information then to the issues more specific around those:

Alex3k commented 3 years ago

Hi @timroes, I feel the graphical query builder is great and reduce the learning curve of interacting with data in Kibana, however may miss the core exciting point of this issue in my opinion. Ignoring the fact Graph is mentioned, the core concept of the below three points (in particular points 2 and 3) would turn Kibana from a visualisation and dashboarding tool like every other BI tool on the market to something that can truly conduct through analysis in the worlds of financial services (AML, Fraud, etc), analysis of manufacturing workflows and deeper analysis into the Telco world.

A common object model is used to represent query objects (term, date range, bool etc) that can be nested and composed as ANDs ORs NOTs etc using a query builder.

All visualizations can produce query objects based on user selections which are then Ctrl-C copied or dragged

Some visualisations can consume query objects as the choice of subject matter e.g. timeline visualizations or Graph - consumption happens through Ctrl-V paste or drop operations.

As our roots and power plays come from the world of search, we should considering broadening our horizons of our underlying search capabilities - not from a new search type or command but the underlying way users actually can search for data. As we grow into new organisations and deeper into existing organisations our users will become less technical and more business focused. We will start competing with the likes of BI tools and dare I even say Excel a lot more. I see this in my strategic account list today.

Another reason for exploring the above three points as a new way for users to search for data is because data is always dirty. In a large organisation there is often the folks who care about the underlying Infrastructure of Elastic (less so as we move towards Cloud), folks who worry about getting the data into Elastic and then the folks that use the data. More often than not, these teams are disjointed. Further to that, the latter team, the users are more often than not business users - not savvy engineers. Meaning that if we suggest that they can use runtime fields to fix their dirty data once it's in Elasticsearch they will be confused as see this as a barrier and something else to learn. Increasing the time to value and decreasing the user sentiment towards Elastic compared to other "simpler tools".

When data is pristine and curated carefully eg Amazon product catalog, there’s only one way to define a concept eg colour:red. In less clean data it may appear in a structured field, a title, a product description or RGB value. You need queries to OR together the variety of ways a poorly defined concept may be expressed in data. This could include typos of names. That’s before we even get to fraud cases where there’s deliberate use of multiple IDs by individuals trying to evade detection.

This alternative would be like @markharwood suggests - the idea of query objects. Taking groups of data points which today are disparate due to the way we support searching for data in Kibana but then group them together into one entity. This way if we have a bunch of documents which have a few fields which mean the same but are slightly different due to bad data hygiene, we can group those together as one query object and explore what those all mean. In the world of financial services and our other focused verticals, this would be essential for further adoption as no one's data is clean.

elastic / kibana