evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown

https://evidence.dev

MIT License

3.44k stars 167 forks source link

Decide on Chart Architecture #136

Closed hughess closed 2 years ago

hughess commented 2 years ago

📢 Feedback Needed - help us build an amazing chart library!

We've heard from several community members that they want to see more capabilities built into our chart library. We've also received requests to include support for third-party chart libraries in Evidence (like Vega, Highcharts, Plotly and Chart.js).

Given the importance of charts to the overall experience of using Evidence, we want to do a thorough assessment of our options for building the best chart library you've ever used.

We will post our criteria and thoughts in this issue, and hope to hear feedback about:

What you would like to see in our chart library (please let us know if your ideas are must-haves or nice-to-haves for you)
Any experience you've had with third-party chart libraries (positive or negative!)

hughess commented 2 years ago

Background on Current Architecture

Our current library is custom-built using D3, Layer Cake, tidyJS, and basic SVG elements:

D3 and tidyJS for data processing
SVG elements to build the actual graphics/text for the charts
Svelte components to loop through datasets and create the SVG elements
Layer Cake to share data across components and combine reusable components into a single chart with the appropriate layout type (SVG, HTML, Canvas, WebGL)

Structure of a Chart

Our chart components all follow the same structure, aiming to have components that are as reusable as possible across chart types. The focus on reusability was inspired by Layer Cake:

One master <Chart/> component, which handles any transformation, sorting, stacking, and filtering of the data
"Primitive" components containing:
- The graphics to be drawn (<Line/>, <Column/>, <Bar/>, <Scatter/>, etc.)
- Axes (<XAxis/>, <YAxis/>)
Final chart components, which provide a simple interface for arranging the smaller components above - these are the components we expose in Evidence (<LineChart/>, <ColumnChart/>, <Hist/>, etc.)
- These components also perform checks on the inputs and handle errors

Line Chart Example

For example, <LineChart/> reads in the data and column names you provide, checks that they are complete, then builds a chart using the code below:

<Chart {data} {x} {y}>
     <XAxis/>
     <YAxis/>
     <Line/>
</Chart>

<Chart/> then uses Layer Cake to combine all of these components into the final chart that gets rendered on your page.

hughess commented 2 years ago

What we like about this approach

Lets us simplify the experience of building a chart - abstract away a lot of the underlying complexity of building a chart, with goal to allow you to create a publication-quality chart with a single line of code
Full control of aesthetics - making sure they fit with the overall appearance of an Evidence project
Works well with Svelte - Layer Cake has several examples on their website of charts built with Svelte components
Flexible - there are very few limits to what you can build using these tools. If you look up examples of what people have built with D3, the results are very impressive
Composable - you can combine chart types and other elements within a <Chart> component. For example, you could imagine adding another chart type and an annotation with the code below (this is example syntax - contains incomplete info):
```
<Chart {data} {x} {y}>
 <XAxis/>
 <YAxis/>
 <Line/>
 <Bar/>
 <Annotation text="New product launched" x=2012 y=160000/>
</Chart>
```
Supports custom annotations - something like this nice example from Layer Cake
No third-party syntax/config to learn - our library is largely using the underlying web technologies that already exist in SVG, HTML, and CSS
Can reduce dependencies - over time, can reduce our dependencies by interacting directly with the data and the underlying web technologies

What we don't like about this approach

Development Time - can require a lot of time for new chart types or features that are already in existing libraries (interactivity, zooming, etc.)
Learning Curve - may be higher for new contributors than if we used a popular chart library
Performance Questions - unsure of performance comparison against established libraries and if the difference is meaningful
Reinventing the Wheel - will require us to build ways to manage conflicts between elements within a chart (e.g., handling axis label overlaps, annotation overlaps, axis labels extending outside of the page area). These problems have been largely solved by existing libraries
Histogram Issues - our histogram currently leads to memory issues if used in a loop (likely due to the structure of our data processing functions)
Syntax for Complex Charts - it's not clear yet what the syntax would be to enable complex multi-series and multi-type charts

hughess commented 2 years ago

With all that in mind, we're now considering a few options - and would love to hear your feedback:

Chart Architecture Options

Stick with current architecture
Switch to a third-party chart library for all charts and chart features
Use a third-party chart library under the hood, but use Layer Cake to overlay custom components on a chart (e.g, custom annotations as described above)

For the most part, the front-end experience of using the charts would stay the same as it is now, with our declarative syntax, but possibly with some changes to enable multi-type charts without a lot of code.

In all of these cases, we may still add support for other third-party libraries if people need them (e.g., adding a <Vega/> component where you could include a Vega config).

hughess commented 2 years ago

If we were to go with a third-party library, here's some criteria we've thought about. Let us know if there's anything you would add/remove/change!

Third-Party Library Criteria

Charts

Has basic chart types for single and multiple series:
- Line
- Bar (+ stacked and grouped)
- Column (+ stacked and grouped)
- Area (+ stacked)
- Scatter
- Histogram
Has large library of available chart types beyond basic charts. Examples:
- Stock charts
- Maps
- Sankey
- Funnel
- Dumbbell (aka Barbell, aka Cleveland Dot Plot)
Composable (can easily build complex charts including different chart types - bars, lines, etc.)

Formatting

Customizable (can get the charts into our Evidence formatting)
Support for links on axis labels
Number/text format customization
Support for international formatting
Support for dark mode (or ability to build dark mode formatting)

Responsiveness/Sizing

Responsive
Renders nicely on web and mobile
Resizes charts to fit area without overlap (e.g., long labels)
Can be used to create small multiples

Other Features

Has interactivity options
Supports customized annotations
Supports custom error handling
Direct labeling ability (e.g., label chart with multiple lines, avoiding overlaps/conflicts wherever possible)

Architecture

Works with Svelte-Kit and Svelte
For libraries that use DOM manipulation, may be able to use Svelte Actions

Support & Maintenance

Popular library / well-supported and actively maintained

hughess commented 2 years ago

Contenders for third-party chart library

Helpful comparisons of libraries

https://www.codewall.co.uk/best-javascript-chart-libraries/ https://en.wikipedia.org/wiki/Comparison_of_JavaScript_charting_libraries

yhoiseth commented 2 years ago

Thanks for the write-up. I have some quick comments.

Architecture Options

I think I would prefer a bare-bones implementation of a third-party charting library. That way, I could easily copy-paste examples and ask questions on Stack Overflow. Also, fewer layers between me and the charting library would mean fewer bugs and presumably less work for you. Dope abstractions could be added later.

Criteria

Looking pretty complete. You might want to add mapping and maybe stock charts to your list of required chart types.

Contenders

Personally, I have found Highcharts to work well. I don’t have a lot of experience with the others, though.

hughess commented 2 years ago

Thanks @yhoiseth! I'll add those chart types to the list.

Your comment about the bare-bones implementation makes sense to me. I think we would want to do both: have the option to build a chart from scratch (including copy-pasting examples), while also building the abstracted, simplified versions over time.

I've played around with Highcharts a bit and have been impressed with what it can do.

The biggest unknown with Highcharts is how we would get our default data format into the format Highcharts needs to build a chart. It would definitely be possible - it's just a matter of how much data processing code we'll need to write to handle a query result on it's way into the Highcharts configuration. Ideally, you could pass the chart a full query result and tell it which columns you want to use (vs. passing in a separate dataset for each column).

Are there any downsides to Highcharts that you've come across?

wylbee commented 2 years ago

Thank you for the write-up. I agree with @yhoiseth's feedback on the architecture options- a bare-bones implementation of a third-party charting library would be extremely helpful.

I've had positive experiences with Vega/Vega-Lite/Altair.

yhoiseth commented 2 years ago

Are there any downsides to Highcharts that you've come across?

I’ve found the configuration options to be a bit daunting. That’s one reason why the examples and Stack Overflow are so important. (Highcharts employees answer questions on Stack Overflow really quickly.)
A minor annoyance is that the charts come with their own styles — e.g. font-family. I would have preferred if they just picked up on the styling of the page. Hopefully, this wouldn’t be a problem for Evidence as the styles could maybe be overwritten by Evidence.
It’s not entirely free. I’ve found the license options to be very sensible, but there is still some friction.
I had an issue with somewhat advanced interactive behaviour. It was closed as stale. I think that’s an annoying policy in general, but, as I had moved on from the project, I didn’t object.

hughess commented 2 years ago

Thank you for the write-up. I agree with @yhoiseth's feedback on the architecture options- a bare-bones implementation of a third-party charting library would be extremely helpful.

I've had positive experiences with Vega/Vega-Lite/Altair.

Thanks @brown5628! Are there any things you really liked or disliked about Vega/Vega-Lite/Altair?

hughess commented 2 years ago

I’ve found the configuration options to be a bit daunting. That’s one reason why the examples and Stack Overflow are so important. (Highcharts employees answer questions on Stack Overflow really quickly.)

I was playing around with Highcharts a few days ago and opened an issue on GitHub to ask a question - their team responded impressively quickly.

The configuration options are a lot, but they don't seem out of line with the other libraries generally.

A minor annoyance is that the charts come with their own styles — e.g. font-family. I would have preferred if they just picked up on the styling of the page. Hopefully, this wouldn’t be a problem for Evidence as the styles could maybe be overwritten by Evidence.

I checked this out and confirmed that we can set default styles to match Evidence formatting.

I had an issue with somewhat advanced interactive behaviour. It was closed as stale. I think that’s an annoying policy in general, but, as I had moved on from the project, I didn’t object.

Thanks for sharing that - that's interesting. I agree on the stale policy. As an aside, it seems like the consistent use of jsfiddle is great for sharing issues.

hughess commented 2 years ago

We've put together a comparison table with everything we've discussed above for the various libraries. Still a lot of info to collect and fill in, but if anyone has any info to contribute, please add it to this Google sheet!

Chart Comparison Google Sheet

tacastillo commented 2 years ago

I'm personally a big fan of Vega and out of the various libraries/grammars I've used before, Vega fits typically fits the bill for if you want to create a charting library without a lot of bespoke charts. Love being able to able to dynamically generate charts because of how clean cut their grammar is. Its SVG rendering engine is pretty well constructed too. If you ever feel like you need more granular, JS-based control or the built-in CSS overrides don't fit the bill, the classes and HTML structure among Vega charts are pretty consistent. So it's not that bad just navigating the DOM and making the changes yourself, ex. interacting with the chart and having it dynamically update another part of the page, etc.

There was a note mentioned about performance issues when it came to the histogram. The first thing I tell anyone facing performance issues with charts is "aggregate your data better before it hits the UI". Evidence unfortunately isn't in control of that because its end users are the ones who are writing the queries for now. Thankfully, Evidence also builds everything at compile time, right? So a potential enhancement in the future is tracking lineage of exposures like dbt does and aggregating the data that goes into the charts at compile time. "If this data is used for a histogram, bucket it, but if it's also used for a scatter plot, throw it raw", as a contrived example.

The second thing I tell everyone hitting chart performance issues is "Have you thought of an canvas rendering engine instead of HTML/SVG?" It's the second thing because that's a massive hot mess since not only are charts harder to render in Canvas due to you having more granular control, but also at that point you're basically writing two charting libraries.

Thankfully, Vega and most declarative grammars/libraries take care of that and have dealt with it for you. For example, Vega's Views can take a param to render in Canvas or SVG (SVG default). You lose the JS and CSS-based control I was mentioning, but having the chart render and not freeze your computer is more important than being able to do some fun interactivity.

Besides having Evidence or the end user define whether to use canvas or SVG, I've had some fun dynamically deciding behind-the-scenes based on the size of the underlying data of the chart. You can do some benchmarking to figure out how many records are needed to define whether or not to consider SVG vs canvas. This way the end user (developer) will have an optimization baked in without thinking about it.

What are people's thoughts on creating a high-level API to interface with the charting libraries that is built with the intention of being able to swap out the underlying libraries as Evidence grows? This way we can tinker about and "theoretically" make a bunch of changes to how it performs and the people developing with it won't even know it's happening.

I can talk some more about other ways to have a declarative API for charts, but I'll just dump my opinion that I think Vega + a high-level API provides enough flexibility, adoptability, and room for growth.

Also, when thinking about developer experience, as a data analyst/scientist, I'd want to feel like I'm not writing front end code. To many people, introducing them to a whole new ecosystem is daunting and may turn some people away. I'd suggest against an HTML-like syntax/grammar for declaring charts and lean more towards a function-based syntax. such as line_chart(data, "horizontal", y_axis_max = 100, x_axis_min = 50), and such, kind of like how R Shiny and (to a milder extent) Plotly Dash expose building elements and components. Functions are a more familiar concept than HTML to what I think the intended audience is for Evidence.

hughess commented 2 years ago

Thanks @tacastillo! Great to get your thoughts. I've been playing around with a few of the libraries recently and I do like how consistent Vega is in terms of grammar and data inputs for various chart types.

There was a note mentioned about performance issues when it came to the histogram. The first thing I tell anyone facing performance issues with charts is "aggregate your data better before it hits the UI".

I think this is the issue with our histogram - the aggregations aren't being done efficiently before hitting the UI (though I suspect it is because of the way I wrote them).

Love the idea of exposures, but it's probably something for further in the future.

The second thing I tell everyone hitting chart performance issues is "Have you thought of an canvas rendering engine instead of HTML/SVG?" It's the second thing because that's a massive hot mess since not only are charts harder to render in Canvas due to you having more granular control, but also at that point you're basically writing two charting libraries.

We have only tried Canvas rendering a few times, but have stuck with SVG so far because it looked a lot crisper in our tests. Seems like it would be a good benefit for our chart library to give us the option.

Besides having Evidence or the end user define whether to use canvas or SVG, I've had some fun dynamically deciding behind-the-scenes based on the size of the underlying data of the chart. You can do some benchmarking to figure out how many records are needed to define whether or not to consider SVG vs canvas. This way the end user (developer) will have an optimization baked in without thinking about it.

Dynamically deciding the rendering engine based on the size of the data sounds great - looking forward to that benchmarking and optimization when we have our library all set up.

What are people's thoughts on creating a high-level API to interface with the charting libraries that is built with the intention of being able to swap out the underlying libraries as Evidence grows? This way we can tinker about and "theoretically" make a bunch of changes to how it performs and the people developing with it won't even know it's happening.

I can talk some more about other ways to have a declarative API for charts, but I'll just dump my opinion that I think Vega + a high-level API provides enough flexibility, adoptability, and room for growth.

Also, when thinking about developer experience, as a data analyst/scientist, I'd want to feel like I'm not writing front end code. To many people, introducing them to a whole new ecosystem is daunting and may turn some people away. I'd suggest against an HTML-like syntax/grammar for declaring charts and lean more towards a function-based syntax. such as line_chart(data, "horizontal", y_axis_max = 100, x_axis_min = 50), and such, kind of like how R Shiny and (to a milder extent) Plotly Dash expose building elements and components. Functions are a more familiar concept than HTML to what I think the intended audience is for Evidence.

I like the idea of a declarative API and being able to swap out the underlying libraries as needed. There may also be scenarios where we need to use multiple libraries (e.g., when we get to more complex viz types like maps), so that flexibility would be useful. Would love to hear any ideas you have about declarative chart APIs!

That's an interesting point about the syntax - I think you're probably right about the intended audience's preference for functions. I'm personally okay with the HTML-like syntax, but interested to hear what people think.

One thing I don't know much about in Vega is its potential to build complex custom viz. Have you used Vega for that? I'm wondering at what level of customization you would need to leave Vega and go to D3/SVG to build from scratch.

hughess commented 2 years ago

Update on Library Contenders

We've spent a lot of time looking into these chart alternatives and weighing the pros and cons. We've narrowed the options down to a few which we are going to test. Below is the summary of our thoughts on each library so far.

While we're taking many of the libraries out of the running, we're still looking into ways to support these libraries for people who want to use them instead of the library we ultimately choose.

Libraries we are removing as contenders:

Highcharts (license issues)
- Highcharts has many great features and fantastic support, but after speaking with their sales team, it seems that their license wouldn't work for us. At best, the license would be an admin hassle to manage, but at worst it would pose a legal risk
- ECharts seems to be the most similar library to Highcharts and they share most (if not all) of the same features
Observable Plot (low popularity)
- New project from Mike Bostock based on grammar of graphics
- Examples look good, but the project is too early to rely on - limited examples and community support
- Observable recently added D3 Charts (reusable D3 examples), which is great - but we are not sure of the distinction between that and Plot
Chart.js (few core options)
- Big community and popular library, but not enough examples built out yet and they are focused on keeping the core codebase minimal (different use case from what we need)
AnyChart (low popularity)
- Examples look great and they have a large selection, but the small community and commercial license won't work for us
Plotly (low performance)
- We found a performance comparison done by uPlot (https://github.com/leeoniya/uPlot#performance) which shows slow performance by Plotly
- Unclear that Plotly would work easily with Svelte/Svelte-Kit
AMCharts (low performance)
- AMCharts also performed poorly in the benchmarking test from uPlot, and has a commercial license
FusionCharts (low popularity)
- Commercial license and small community
ApexCharts.js (low performance)
- Small list of pre-built examples, low performance in uPlot tests

Libraries still in the running:

ECharts
- Impressive example library
- Large community
- Great documentation
- Looks like the library is moving in a very positive direction for functionality and support
- Fully custom viz appears to be possible through ECharts' API
Vega
- Have heard great feedback from developers who have used Vega
- Big community and a lot of development happening
- Consistent grammar and chart structure
- Docs not as easy to follow as something like ECharts
- Full level of customization is unclear - need to confirm (hoping to avoid situation where we would need to jump out into D3/SVG to write things from scratch)
Layercake + D3/SVG (our current library)
- Built for Svelte
- Many basic problems still to be solved and we would need to write the solutions (reusable interactivity options, custom annotations, special styling situations)

Next steps:

Test ECharts and Vega in development with real query results in Evidence
Collect feedback from the community about their experience with these libraries or thoughts based on what they've seen from each - let us know what you think!

tacastillo commented 2 years ago

Haven't used ECharts before, but just a cursory look at the docs has me thinking it looks promising. I like their concept of defining a theme once and reusing it across charts. Also it puts the burden of making sure the styles are consistent across rendering engines on them. And oof, that theme builder 👍🏽 .

What are your top priorities for charting right now? (ex. switching to a charting library, usability assessments, high-level APIs, benchmarking, etc.)

hughess commented 2 years ago

Within the next 2 weeks, we want to pick one library and write version 1 of the new high-level API & chart templates (replacing our existing API/templates).

Here's what I have in mind:

Confirm we can use each library:
- Build a general use component for the charting library where you can drop in your own config
- e.g., <ECharts {data} {config}/>
- Test by building charts in an Evidence project with various data types (I'm doing this with ECharts now and will do the same with Vega)
- Confirm charts can accept tidy and non-tidy data
- Test out error handling capabilities (how much control do we have over the content and appearance of error messages)
Confirm performance:
- Confirm that charts render quickly in development mode and production mode
- See if one library has significantly better performance than the other
- Confirm that we'll be able to optimize chart performance in the future
Pick one library to build our API
Build the API (maybe this becomes a new issue to work through the structure of the API)
- Write the first chart type in the new library to replace our existing one (e.g., <LineChart/>)
- Write a contribution guide based on the first new chart, which can be used for any other example charts from the library we're working with
- Fill out as many other examples as we can!
- I think we use the HTML-type syntax for now to keep with Svelte component formatting, but we should think through our options for using function-based syntax eventually as well
Build Evidence theme/styles to use across all charts

There's a couple areas I don't know much about, so if you have any thoughts or if you want to jump in on these, please feel free!

Getting Vega to work in an Evidence project (there is a Svelte Vega package, but I haven't been able to get it to work yet)
Benchmarking/comparing performance of the libraries

hughess commented 2 years ago

So far ECharts is looking promising for usability: CleanShot 2021-10-22 at 09 14 11

Some concerns I have:

It's not clear yet how to use tidy data format to create multi-series charts (passing a series column to a chart)
The handling of discrete axes is not ideal (e.g., example charts show year as axis type of category, meaning we would need to do some transformation to the data to ensure that every year is represented on an axis)
A few formatting concerns - you can see some small gaps between the stacks in the bar chart I show here

hughess commented 2 years ago

Some news to share - we've made the decision to go with ECharts for our chart library! 🎉

Given how important our chart library is for both the developer and reader experience in Evidence, we gave this a lot of thought. Here are the main reasons we really like ECharts:

Flexible library
- We can build high level products like chart templates, as well as low level products like custom shapes, using the ECharts API
Many chart examples
- ECharts has an impressive collection of example charts that give us a head start for adding new chart types
Confident in direction and pace of development
- They are adding new capabilities fairly quickly and are tackling technical problems that would have been time-consuming and challenging for us to address in our own library
Good support on GitHub
- We've opened several issues and received prompt responses
- We will likely need to contribute to the open source code base for some features, but they offer good documentation on creating pull requests
Strong support for responsiveness, interactivity, accessibility, and mobile
- We've found their API to be nice to work with for setting up responsiveness and interactivity of charts
Thorough documentation site with some impressive tools for understanding the content

Stay tuned for our first version of the ECharts-based chart library, which will be coming soon!