ethyca / fides

The Privacy Engineering & Compliance Framework
https://ethyca.com/docs
Apache License 2.0
355 stars 73 forks source link

Add a visual "taxonomy explorer" UI to the server #111

Closed NevilleS closed 2 years ago

NevilleS commented 3 years ago

Overview

It's really hard to navigate the data categories taxonomy right now when annotating a system or dataset. If you already "know" what's in there, you can probably poke around in the YAML to figure things out, but for any new user they need some kind of aide to understand how it's constructed, search around for the best match, etc.

Requirements

A basic "taxonomy explorer" would solve a couple functions:

  1. Visualize the taxonomy in a tree view to show the hierarchy
  2. Be populated dynamically from the configured taxonomy for the server (including any customizations)
  3. Have a text input where you can type in keys that highlight branches of the taxonomy
  4. Render in a basic browser window
  5. (future) Embed into a VScode plugin to render in the IDE

This is far from a final set of ideas or designs, but hopefully we can discuss some options for how we might prototype this as I think it'd be really useful to start building some visuals for all this raw metadata we're working with...

Mockup

Quick mockup of this would look like: image

The above shows:

Data Visualization Example

As a separate project, we created this prototype visualization here which helps a lot: https://clever-ptolemy-e3eb96.netlify.app/ image

This is a wholly separate d3.js page though, which might not be a good fit for our stack and is more designed for form instead of function. That said, I found that using this dataviz UI as a visual aide was still wildly better than trying to hunt & peck in the YAML files 👀

ThomasLaPiana commented 3 years ago

I think the main investigation here, and probably what will determine complexity, is what tools we need to use to get this to work... For instance, a static DAG of all of the categories would be doable in pure python (dask can render out images of DAGs), but the interactive nature of it hugely increases the complexity.

maybe something like this would work? https://pythonhosted.org/dagger/ we could use the "stale" feature to highlight the specific item and highlight the downstream stuff for free

NevilleS commented 3 years ago

Yeah- it comes down to how much we want to invest in scaffolding out the frontend architecture here and where we see things going. Personally I think we'll find many, many uses for various data visualizations (taxonomies, system maps, datasets, etc.) so having something that makes it easy to build visuals will be key.

Leaving a couple of my notes here after doing some light research on solutions to throw a dataviz layer on our FastAPI backend:

So basically, if I were to prototype this now I'd do one of two things:

  1. create a react app, like fidesctl/ui/package.json
    • have a make frontend target that builds a /dist
    • have a really simple frontend template that renders a basic UI and can make API calls
    • add FastAPI endpoints that return plotly JSON (https://plotly.com/chart-studio-help/json-chart-schema/)
    • render the plotly chart using the react-plotly.js libraries
  2. create a dash app and sideload it into our server
ThomasLaPiana commented 3 years ago

got it, i would consider this a relatively large/risky ticket then due to the amount of unknowns/"firsts" this feature will require.

Do you see this as something that needs to get in before the launch? Trying to get a feel for how we should prioritize this

NevilleS commented 3 years ago

100% yes it's pretty large & risky. Ideally, if there's a simple way to approximate this (TBH we could have our github pages docs include a good taxonomy visual?) then that'd be a quick way to bridge the gap until we sit down and build a thoughtful UI for all this.

I think we need something before launch to visualize the taxonomy. One option would be to update our docs to point to this separate page: https://clever-ptolemy-e3eb96.netlify.app/, but note that's not quite as good because it only shows the default taxonomies (without any user-specific customizations). We might be able to include that code here, but it's written with a totally different stack and use case (100% d3.js) so I'd hesitate to do that as it'll just be tech debt here.

ThomasLaPiana commented 3 years ago

this might tie into another ticket #91, in that i eventually want to be able to compile docs from source code and have them included in the docs site. That part of the docs wouldn't be able to hot reload probably, but it would solve a lot of problems (for instance, maintaining the Model schema in two places, the docs and the code itself).

So, is a happy balance then having a fidesctl.core.docs module that can extract entire taxonomy and generate images of it, and then add it to the docs? It can generate a different image for each privacy data type. The technical implementation of it would be to build a graph and then visualize it (Dask does this, so we could probably figure out what library it uses and use that)

brentonmallen1 commented 3 years ago

A reference with some examples of these kind of visualizations done in python: https://towardsdatascience.com/visualize-hierarchical-data-using-plotly-and-datapane-7e5abe2686e1