evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
3.44k stars 167 forks source link

Decide on Chart Architecture #136

Closed hughess closed 2 years ago

hughess commented 2 years ago

📢 Feedback Needed - help us build an amazing chart library!

We've heard from several community members that they want to see more capabilities built into our chart library. We've also received requests to include support for third-party chart libraries in Evidence (like Vega, Highcharts, Plotly and Chart.js).

Given the importance of charts to the overall experience of using Evidence, we want to do a thorough assessment of our options for building the best chart library you've ever used.

We will post our criteria and thoughts in this issue, and hope to hear feedback about:

hughess commented 2 years ago

Background on Current Architecture

Our current library is custom-built using D3, Layer Cake, tidyJS, and basic SVG elements:

Structure of a Chart

Our chart components all follow the same structure, aiming to have components that are as reusable as possible across chart types. The focus on reusability was inspired by Layer Cake:

Line Chart Example

For example, <LineChart/> reads in the data and column names you provide, checks that they are complete, then builds a chart using the code below:

<Chart {data} {x} {y}>
     <XAxis/>
     <YAxis/>
     <Line/>
</Chart>

<Chart/> then uses Layer Cake to combine all of these components into the final chart that gets rendered on your page.

hughess commented 2 years ago

What we like about this approach

What we don't like about this approach

hughess commented 2 years ago

With all that in mind, we're now considering a few options - and would love to hear your feedback:

Chart Architecture Options

  1. Stick with current architecture
  2. Switch to a third-party chart library for all charts and chart features
  3. Use a third-party chart library under the hood, but use Layer Cake to overlay custom components on a chart (e.g, custom annotations as described above)

For the most part, the front-end experience of using the charts would stay the same as it is now, with our declarative syntax, but possibly with some changes to enable multi-type charts without a lot of code.

In all of these cases, we may still add support for other third-party libraries if people need them (e.g., adding a <Vega/> component where you could include a Vega config).

hughess commented 2 years ago

If we were to go with a third-party library, here's some criteria we've thought about. Let us know if there's anything you would add/remove/change!

Third-Party Library Criteria

Charts

Formatting

Responsiveness/Sizing

Other Features

Architecture

Support & Maintenance

hughess commented 2 years ago

Contenders for third-party chart library

Helpful comparisons of libraries

https://www.codewall.co.uk/best-javascript-chart-libraries/ https://en.wikipedia.org/wiki/Comparison_of_JavaScript_charting_libraries

yhoiseth commented 2 years ago

Thanks for the write-up. I have some quick comments.

Architecture Options

I think I would prefer a bare-bones implementation of a third-party charting library. That way, I could easily copy-paste examples and ask questions on Stack Overflow. Also, fewer layers between me and the charting library would mean fewer bugs and presumably less work for you. Dope abstractions could be added later.

Criteria

Looking pretty complete. You might want to add mapping and maybe stock charts to your list of required chart types.

Contenders

Personally, I have found Highcharts to work well. I don’t have a lot of experience with the others, though.

hughess commented 2 years ago

Thanks @yhoiseth! I'll add those chart types to the list.

Your comment about the bare-bones implementation makes sense to me. I think we would want to do both: have the option to build a chart from scratch (including copy-pasting examples), while also building the abstracted, simplified versions over time.

I've played around with Highcharts a bit and have been impressed with what it can do.

The biggest unknown with Highcharts is how we would get our default data format into the format Highcharts needs to build a chart. It would definitely be possible - it's just a matter of how much data processing code we'll need to write to handle a query result on it's way into the Highcharts configuration. Ideally, you could pass the chart a full query result and tell it which columns you want to use (vs. passing in a separate dataset for each column).

Are there any downsides to Highcharts that you've come across?

wylbee commented 2 years ago

Thank you for the write-up. I agree with @yhoiseth's feedback on the architecture options- a bare-bones implementation of a third-party charting library would be extremely helpful.

I've had positive experiences with Vega/Vega-Lite/Altair.

yhoiseth commented 2 years ago

Are there any downsides to Highcharts that you've come across?

  1. I’ve found the configuration options to be a bit daunting. That’s one reason why the examples and Stack Overflow are so important. (Highcharts employees answer questions on Stack Overflow really quickly.)
  2. A minor annoyance is that the charts come with their own styles — e.g. font-family. I would have preferred if they just picked up on the styling of the page. Hopefully, this wouldn’t be a problem for Evidence as the styles could maybe be overwritten by Evidence.
  3. It’s not entirely free. I’ve found the license options to be very sensible, but there is still some friction.
  4. I had an issue with somewhat advanced interactive behaviour. It was closed as stale. I think that’s an annoying policy in general, but, as I had moved on from the project, I didn’t object.
hughess commented 2 years ago

Thank you for the write-up. I agree with @yhoiseth's feedback on the architecture options- a bare-bones implementation of a third-party charting library would be extremely helpful.

I've had positive experiences with Vega/Vega-Lite/Altair.

Thanks @brown5628! Are there any things you really liked or disliked about Vega/Vega-Lite/Altair?

hughess commented 2 years ago
  1. I’ve found the configuration options to be a bit daunting. That’s one reason why the examples and Stack Overflow are so important. (Highcharts employees answer questions on Stack Overflow really quickly.)

I was playing around with Highcharts a few days ago and opened an issue on GitHub to ask a question - their team responded impressively quickly.

The configuration options are a lot, but they don't seem out of line with the other libraries generally.

  1. A minor annoyance is that the charts come with their own styles — e.g. font-family. I would have preferred if they just picked up on the styling of the page. Hopefully, this wouldn’t be a problem for Evidence as the styles could maybe be overwritten by Evidence.

I checked this out and confirmed that we can set default styles to match Evidence formatting.

  1. I had an issue with somewhat advanced interactive behaviour. It was closed as stale. I think that’s an annoying policy in general, but, as I had moved on from the project, I didn’t object.

Thanks for sharing that - that's interesting. I agree on the stale policy. As an aside, it seems like the consistent use of jsfiddle is great for sharing issues.

hughess commented 2 years ago

We've put together a comparison table with everything we've discussed above for the various libraries. Still a lot of info to collect and fill in, but if anyone has any info to contribute, please add it to this Google sheet!

Chart Comparison Google Sheet

tacastillo commented 2 years ago

I'm personally a big fan of Vega and out of the various libraries/grammars I've used before, Vega fits typically fits the bill for if you want to create a charting library without a lot of bespoke charts. Love being able to able to dynamically generate charts because of how clean cut their grammar is. Its SVG rendering engine is pretty well constructed too. If you ever feel like you need more granular, JS-based control or the built-in CSS overrides don't fit the bill, the classes and HTML structure among Vega charts are pretty consistent. So it's not that bad just navigating the DOM and making the changes yourself, ex. interacting with the chart and having it dynamically update another part of the page, etc.

There was a note mentioned about performance issues when it came to the histogram. The first thing I tell anyone facing performance issues with charts is "aggregate your data better before it hits the UI". Evidence unfortunately isn't in control of that because its end users are the ones who are writing the queries for now. Thankfully, Evidence also builds everything at compile time, right? So a potential enhancement in the future is tracking lineage of exposures like dbt does and aggregating the data that goes into the charts at compile time. "If this data is used for a histogram, bucket it, but if it's also used for a scatter plot, throw it raw", as a contrived example.

The second thing I tell everyone hitting chart performance issues is "Have you thought of an canvas rendering engine instead of HTML/SVG?" It's the second thing because that's a massive hot mess since not only are charts harder to render in Canvas due to you having more granular control, but also at that point you're basically writing two charting libraries.

Thankfully, Vega and most declarative grammars/libraries take care of that and have dealt with it for you. For example, Vega's Views can take a param to render in Canvas or SVG (SVG default). You lose the JS and CSS-based control I was mentioning, but having the chart render and not freeze your computer is more important than being able to do some fun interactivity.

Besides having Evidence or the end user define whether to use canvas or SVG, I've had some fun dynamically deciding behind-the-scenes based on the size of the underlying data of the chart. You can do some benchmarking to figure out how many records are needed to define whether or not to consider SVG vs canvas. This way the end user (developer) will have an optimization baked in without thinking about it.

What are people's thoughts on creating a high-level API to interface with the charting libraries that is built with the intention of being able to swap out the underlying libraries as Evidence grows? This way we can tinker about and "theoretically" make a bunch of changes to how it performs and the people developing with it won't even know it's happening.

I can talk some more about other ways to have a declarative API for charts, but I'll just dump my opinion that I think Vega + a high-level API provides enough flexibility, adoptability, and room for growth.

Also, when thinking about developer experience, as a data analyst/scientist, I'd want to feel like I'm not writing front end code. To many people, introducing them to a whole new ecosystem is daunting and may turn some people away. I'd suggest against an HTML-like syntax/grammar for declaring charts and lean more towards a function-based syntax. such as line_chart(data, "horizontal", y_axis_max = 100, x_axis_min = 50), and such, kind of like how R Shiny and (to a milder extent) Plotly Dash expose building elements and components. Functions are a more familiar concept than HTML to what I think the intended audience is for Evidence.

hughess commented 2 years ago

Thanks @tacastillo! Great to get your thoughts. I've been playing around with a few of the libraries recently and I do like how consistent Vega is in terms of grammar and data inputs for various chart types.

There was a note mentioned about performance issues when it came to the histogram. The first thing I tell anyone facing performance issues with charts is "aggregate your data better before it hits the UI".

I think this is the issue with our histogram - the aggregations aren't being done efficiently before hitting the UI (though I suspect it is because of the way I wrote them).

Love the idea of exposures, but it's probably something for further in the future.

The second thing I tell everyone hitting chart performance issues is "Have you thought of an canvas rendering engine instead of HTML/SVG?" It's the second thing because that's a massive hot mess since not only are charts harder to render in Canvas due to you having more granular control, but also at that point you're basically writing two charting libraries.

We have only tried Canvas rendering a few times, but have stuck with SVG so far because it looked a lot crisper in our tests. Seems like it would be a good benefit for our chart library to give us the option.

Besides having Evidence or the end user define whether to use canvas or SVG, I've had some fun dynamically deciding behind-the-scenes based on the size of the underlying data of the chart. You can do some benchmarking to figure out how many records are needed to define whether or not to consider SVG vs canvas. This way the end user (developer) will have an optimization baked in without thinking about it.

Dynamically deciding the rendering engine based on the size of the data sounds great - looking forward to that benchmarking and optimization when we have our library all set up.

What are people's thoughts on creating a high-level API to interface with the charting libraries that is built with the intention of being able to swap out the underlying libraries as Evidence grows? This way we can tinker about and "theoretically" make a bunch of changes to how it performs and the people developing with it won't even know it's happening.

I can talk some more about other ways to have a declarative API for charts, but I'll just dump my opinion that I think Vega + a high-level API provides enough flexibility, adoptability, and room for growth.

Also, when thinking about developer experience, as a data analyst/scientist, I'd want to feel like I'm not writing front end code. To many people, introducing them to a whole new ecosystem is daunting and may turn some people away. I'd suggest against an HTML-like syntax/grammar for declaring charts and lean more towards a function-based syntax. such as line_chart(data, "horizontal", y_axis_max = 100, x_axis_min = 50), and such, kind of like how R Shiny and (to a milder extent) Plotly Dash expose building elements and components. Functions are a more familiar concept than HTML to what I think the intended audience is for Evidence.

I like the idea of a declarative API and being able to swap out the underlying libraries as needed. There may also be scenarios where we need to use multiple libraries (e.g., when we get to more complex viz types like maps), so that flexibility would be useful. Would love to hear any ideas you have about declarative chart APIs!

That's an interesting point about the syntax - I think you're probably right about the intended audience's preference for functions. I'm personally okay with the HTML-like syntax, but interested to hear what people think.

One thing I don't know much about in Vega is its potential to build complex custom viz. Have you used Vega for that? I'm wondering at what level of customization you would need to leave Vega and go to D3/SVG to build from scratch.

hughess commented 2 years ago

Update on Library Contenders

We've spent a lot of time looking into these chart alternatives and weighing the pros and cons. We've narrowed the options down to a few which we are going to test. Below is the summary of our thoughts on each library so far.

While we're taking many of the libraries out of the running, we're still looking into ways to support these libraries for people who want to use them instead of the library we ultimately choose.

Libraries we are removing as contenders:

Libraries still in the running:

Next steps:

tacastillo commented 2 years ago

Haven't used ECharts before, but just a cursory look at the docs has me thinking it looks promising. I like their concept of defining a theme once and reusing it across charts. Also it puts the burden of making sure the styles are consistent across rendering engines on them. And oof, that theme builder 👍🏽 .

What are your top priorities for charting right now? (ex. switching to a charting library, usability assessments, high-level APIs, benchmarking, etc.)

hughess commented 2 years ago

Within the next 2 weeks, we want to pick one library and write version 1 of the new high-level API & chart templates (replacing our existing API/templates).

Here's what I have in mind:

There's a couple areas I don't know much about, so if you have any thoughts or if you want to jump in on these, please feel free!

hughess commented 2 years ago

So far ECharts is looking promising for usability: CleanShot 2021-10-22 at 09 14 11

Some concerns I have:

hughess commented 2 years ago

Some news to share - we've made the decision to go with ECharts for our chart library! 🎉

Given how important our chart library is for both the developer and reader experience in Evidence, we gave this a lot of thought. Here are the main reasons we really like ECharts:

Stay tuned for our first version of the ECharts-based chart library, which will be coming soon!