data-driven-forms / react-forms

React library for rendering forms.
https://data-driven-forms.org/
Apache License 2.0
303 stars 87 forks source link

DualListTreeSelect performance hit with large data sets #1151

Open dlabrecq opened 3 years ago

dlabrecq commented 3 years ago

Scope:

pf4-component-mapper dual-list-tree-select

Description

Cost management users are encountering latency with our settings page when we have large data sets. For our cost-demo user on stage-beta, we have 40,000 GCP items under one folder.

I tested the page loading time for 1k, 5k, 10k, 20k, and 40k items. For 40k selected items, we're seeing page loads of 55 seconds.

That said, the DDF component appears to be converting the options provided via our schema and applies a default sort. There is also a lot of recursion happening with the selectedOptions function.

See https://github.com/data-driven-forms/react-forms/blob/master/packages/pf4-component-mapper/src/dual-list-tree-select/dual-list-tree-select.js#L10

Note that the non-tree version of dual-list has an isSortable prop, but the tree version does not. If we can omit the default sort (e.g., using isSortable={false}), we may see a significant performance increase.

Ideally, we would pre-sort server-side via the Cost Management settings API.

Schema

See https://console.stage.redhat.com/api/cost-management/v1/settings/

Hyperkid123 commented 3 years ago

@dlabrecq do we have a baseline when using the "stock" PF select?

Hyperkid123 commented 3 years ago

I will check it properly later and create examples in code sandbox however. From past experience with huge data sets in other components (array fields in MUI mapper), usually, the rendering is the "problem" when it comes to first renders. Removing the sort function may and probably will shave a bit of time from the first render, but I would guess that 90% of the time would be spent on rendering, not sorting some array.

To put it simply, there might be just too much stuff for the browser to handle properly. To boost the performance to some reasonable levels, we would probably need a virtualized list in the component.

dlabrecq commented 3 years ago

Like you said, omitting the sort will help. However, there is a lot of recursion happening with the selectedOptions function. I think that's probably the underlying problem.

Looking at the settings API, we're providing a list of 40k initial options via the schema. Then, we're providing a chosen (selected) list of 40k values.

The API could easily generate the left and right options server-side, instead of recursively generating the PF options client-side? Ideally, it would just be a pass through to the PatternFly dual-list component.

Hyperkid123 commented 3 years ago

@dlabrecq yeah I think this is the rendering issue. Using stock PF component in this example: https://uenic.csb.app/

The initial render time is kind of difficult to judge because of the environment (I will do it locally at some point), but when you try to select any option, you see a lag of 1-3s before the option style changes. And there are only 10k items. (In the meantime, moving all items to right/left will give you probably a close comparison to the initial render time. It just kills the page.)

The API could generate the left and right options server-side, instead of recursively generating the lists client-side?

I am definitely all for offloading the work to server, but we will still have an issue when actually interacting with the PF component.

However, there is a lot of recursion happening with the selectedOptions function

We will debug that and see if we can make the function faster

dlabrecq commented 3 years ago

Yes, I agree that PF also has performance issues. The PF team is looking to improve the component on their end as well.

However, the Cost Management settings page currently renders 40k items in the available list in 600ms, so it appears to be doable. It's only when we add those same items to DDF's chosen list, when the page loads in 55 seconds. That's when I see all the recursion with the selectedOptions function.

The lag when selecting an option is definitely a PF issue.

Thanks for looking into it.

Hyperkid123 commented 3 years ago

@dlabrecq I was doing a little bit of research about the issue. Here are my findings.

Testing env:

Initial render profile summary

Screenshot from 2021-10-20 11-57-18

The initial load takes around 32s with the input data

Screenshot from 2021-10-20 11-58-26

Convert options performance

The convert options function pointed in the issue description is not the problem here.

Screenshot from 2021-10-20 12-01-18

Here is a total execution time comparison. The highlighted segment is the convert function execution portion.

Screenshot from 2021-10-20 12-30-34

Most expensive function calls

The most resource-heavy function operations are react/react-dom routines

Screenshot from 2021-10-20 12-04-15

These are direct DOM manipulation functions that are directly tied to a component implementation.

Profiling with sort and three props disabled and input data in sorted order

To dismiss any potential profiling mistakes, I have also done a run with the sort option disabled. The convertOptions function is not called in this scenario. the results are a bit better but the relative resource allocation is basically the same.

Screenshot from 2021-10-20 12-08-01

You can see the sorted list of activities that references the same functions

Screenshot from 2021-10-20 12-08-52

And the convertOptions is not present in the list Screenshot from 2021-10-20 12-09-05

Conclusion

The convert options are not responsible for the performance degradation. In fact, the impact is negligible when compared to the rest of the factors.

In my opinion, an implementation change of the dual list selector component to use the virtual list is the only option to fix the issue.

@dlabrecq In case that I have missed something, I would ask you to provide me with the exact dataset you were testing so I can run the profiling again. You can share it in gist snippet or send me a private message if you don't want the data to be public.

EDIT: If you want I will provide the profiling data so you can open and inspect them in your browser.

dlabrecq commented 3 years ago

We've mainly been testing this page: https://console.stage.redhat.com/beta/settings/applications/cost-management

With the DDF schema from: https://console.stage.redhat.com/api/cost-management/v1/settings/

I can forward the user/pwd offline