FormidableLabs / victory

A collection of composable React components for building interactive data visualizations
http://commerce.nearform.com/open-source/victory/
Other
10.96k stars 525 forks source link

Performance with large data sets #1669

Open dlabrecq opened 4 years ago

dlabrecq commented 4 years ago

Bugs and Questions

Checklist

The Problem

Want to move from Prometheus to Victory, but Victory charts do not perform well with large data sets.

For our use case, the point of using large data sets is to see if anything stands out, rather than looking individual lines in a graph. Either too low or too high is what users tend to look for in a graph that has a lot of series.

Update: Prometheus appears to generate charts using Rickshaw, which is built on D3.

Reproduction

https://codesandbox.io/s/victory-perf-issue-vh3kp

In this example, DATA_SET_COUNT = 150. The chart tooltip really lags when DATA_POINTS_PER_SET = 100. And the browser begins to hang at DATA_POINTS_PER_SET = 300 or more.

Feature Requests

Checklist

There is an issue (#655) related to Zoom & brush containers from 2017; however, performance issues can be seen with just a bare bones line chart.

Description

Victory is based on SVG and renders individual nodes for each data point. However, Prometheus / Rickshaw charts are also based on SVG and have much better performance with large data sets.

For the Prometheus / Rickshaw chart example below, there are a total of 134 time/data series. Each data series has approximately 1800 data points. Despite there being more than 200k data points, there are no performance issues. The Prometheus / Rickshaw chart renders quickly and the tooltip is very responsive.

Prometheus / Rickshaw chart (200k+ data points) prometheus-chart

In comparison, the Victory line chart, from the reproduction codesandbox example, does not perform well with large data sets, despite having far fewer data points than the Prometheus / Rickshaw example.

The chart tooltip really lags when DATA_POINTS_PER_SET = 100 (see below). And the browser begins to hang at DATA_POINTS_PER_SET = 300 or more. The timer count at the top demonstrates how much the browser has slowed.

Victory line chart (15k data points) victory-chart

Prometheus / Rickshaw output Noticed in the debugger that Prometheus / Rickshaw renders path tags without clip-path tags, which may contribute to better performance with large data sets?

Screen Shot 2020-08-18 at 2 27 55 PM
boygirl commented 4 years ago

@dlabrecq you can test this by changing the groupComponent for VictoryLine so that it renders a plain g rather than the clipped container that is needed for its load animation.

dlabrecq commented 4 years ago

@boygirl Thank you for that tip. I created a new sandbox example with groupComponent={<g />}. Unfortunately, omitting the clip-path did not resolve the issue.

https://codesandbox.io/s/victory-group-28kyx

j-funk commented 4 years ago

I came here looking to see if there was any way to work with larger datasets (though I think the sizes mentioned in this issue are more medium sized than large). We're noticing a huge slowdown in rendering performance at ~5k in VictoryScatter which takes 3 seconds to render. So, I'm adding my +1 to this issue.

becca-bailey commented 3 years ago

Hello! I just wanted to provide an update here. We are doing some performance auditing and improvement right now. You can expect to see some small changes soon (like #1901), and some plans for larger performance improvements in subsequent releases. Keep us posted as you observe changes in performance, or if you have any specific recommendations!

Harim-T commented 3 years ago

Having performance issues with large data sets too, specifically rendering stacked VictoryBar.

Reproduction: codesandbox

becca-bailey commented 3 years ago

Hi @harimtejada - I messed around with your sandbox a bit, and one reason that you might be seeing poor performance in this VictoryStack chart is the way that you are creating a separate VictoryBar instance for each data point. This creates some additional work for the VictoryChart layer, as it needs to iterate through each of those charts in order to calculate the domain, range, and scale for the chart.

I was able to speed it up quite a bit by chunking the data by color, and passing an array of x/y values into each VictoryBar instance rather than just one. Let me know if this helps! https://codesandbox.io/s/victory-chart-test-forked-o7pnx?file=/src/App.js

Harim-T commented 3 years ago

Hi @harimtejada - I messed around with your sandbox a bit, and one reason that you might be seeing poor performance in this VictoryStack chart is the way that you are creating a separate VictoryBar instance for each data point. This creates some additional work for the VictoryChart layer, as it needs to iterate through each of those charts in order to calculate the domain, range, and scale for the chart.

I was able to speed it up quite a bit by chunking the data by color, and passing an array of x/y values into each VictoryBar instance rather than just one. Let me know if this helps! https://codesandbox.io/s/victory-chart-test-forked-o7pnx?file=/src/App.js

Makes sense, thank you @beccanelson 🙏 it's way better now.

cgallivanpam commented 2 years ago

Hi Guys, I was going to open another Performance issue for Large Data Sets, but figured I would just join in on this thread with quite some detail. @boygirl @beccanelson, please let me know if I should open another issue...

That being said here is a snippet video of what I am facing:

ezgif-3-4adf79f6121f

I am trying to use a large data set (>5000), which is split up into multiple subSets (>2000). For each Subset, a Victory Line is rendered. I have looked at other open/closed issues, and it feels to me that the architure is not solved, however there may be workarounds. I would like to continue using VictoryCharts, as I have other charts with less data around my application.

Here is the code:

Can anyone tell me what a potential workaround would be?

const PointClassDataLineChart = () => {
  const VictoryZoomVoronoiContainer = createContainer('zoom', 'voronoi');

  const historianFacts = useSelector((state) => state.facts.historianFacts);

  if (historianFacts.data && historianFacts.data.data) {
    return (
      <View style={styles.container}>
        <VictoryChart
          domainPadding={{ y: 10 }}
          containerComponent={
            <VictoryZoomVoronoiContainer
              labels={({ datum }) => `${datum.x}, ${datum.y}`}
            />
          }
        >
          <VictoryGroup colorScale={['tomato', 'orange', 'gold']}>
            {Object.keys(historianFacts.data.data).map((key) => {
              const coordinates = historianFacts.data.data[key].coordinates;

              return (
                <VictoryLine
                  data={coordinates}
                  style={{ data: { strokeWidth: 1 } }}
                  interpolation={'catmullRom'}
                />
              );
            })}
          </VictoryGroup>
          <VictoryAxis tickCount={4} />
          <VictoryAxis dependentAxis />
        </VictoryChart>
      </View>
    );
  }
  return <View></View>;
};

Thanks!!

becca-bailey commented 2 years ago

Hi @cgallivanpam, thanks for sharing that example! My current hypothesis is that the work Victory is doing to traverse its child components to get the domain and range data, particularly when they are nested in a container like VictoryGroup is causing some issues here. I have opened a related issue for this, and I should have some time to look into it next week.

Eramirez06 commented 1 year ago

any update on this?

itsmatheusmoura commented 1 year ago

Hello everyone, exists one solution to improvement the performance for charts. Basically we filter data points on domain to showing just seeing in this moment. Like a documentation from Victory Chart suggests https://formidable.com/open-source/victory/guides/zoom-on-large-datasets I use with react native and greatly improved the performance