cylc / cylc-ui

Web app for monitoring and controlling Cylc workflows
https://cylc.github.io
GNU General Public License v3.0
37 stars 27 forks source link

Performance placeholder issue #222

Closed kinow closed 3 years ago

kinow commented 5 years ago

A placeholder for performance issues. Set to milestone 0.1, but it will probably be moved to 1.0 I think.

Since the first prototypes that were able to display an e2e system, we have had issues running the workflow complex, an example suite from Cylc 7.

That workflow does not execute any command locally, simply sleeps, but it spawns heaps of tasks, with dependencies/requirements galore.

Another issue that we may have is with the progress percent calculation in the Tree View and Graph View.

Among other problems, so feel free to chime in and register any issue or progress here, or link with related issues.

Outstanding issues:

Good resources

kinow commented 5 years ago

I will start with a quick experiment I did today. I finished implementing the progress for the tasks in the Tree view. But the issue I had, I think, is not related to this new feature.

The complex suite was running fine, but the browser seemed to have some memory leak. I used an Ubuntu LTS, with an i7 16GB ram 200 GB SSD, running firefox with some 10 tabs, WebStorm, npm and cylc.

Using Chrome and its great memory tool (love Firefox, but their memory tool is a bit behind...), I took a few snapshots, as in the screenshot below (memory sizes below the snapshot name).

image

To populate the DOM, Vue.js uses a virtual DOM and creates VNode's to attach to it. Here's the progression of the VNode's:

(Note: shallow size always <= retained size... shallow is simply the size for the object instances, retained is that shallow size plus whatever other objects are required/retained with that object)

The Snapshot 5 is very strange. I had stopped the suite already, but the memory hadn't reduced.

Then took yet another snapshot, now about 25 minutes later, while writing this paragraph.

Which means that are objects not being collected. Possibly attached to the DOM? This needs to be fixed, whether we want to support the workflow complex or not, as simply running a very simple suite for long periods of time may lead to this memory leaking that will propagate and generate issues after some time, like tab crashing or extremely slow.

kinow commented 5 years ago

Just to record the sizes of the top objects in the heap, snapshots are in order from 1 to 6.

image

image

image

image

image

image

ps: a snapshot of Cylc opening the tab and displaying just the dashboard (i.e. no query running, as the dashboard is not rigged with the WorkflowService yet [which was good for this quick test]) reports 25.4 MB, as in the screenshot below (had to increase the area with objects to be able to display VNode's.

image

kinow commented 5 years ago

And for the record, when Vue devtools crash, in Ubuntu with XFCE it displays a pop up notification in the desktop (i.e. operating system notification) that disappears after a few seconds, but also in the vue devtools you will see later:

Screenshot_2019-09-13_19-09-35

kinow commented 5 years ago

Useful links:

hjoliver commented 5 years ago

Another issue that we may have is with the progress percent calculation in the Tree View and Graph View.

I don't think so - that's a trivial bit of maths for a few nodes, it is surely nothing compared to all the computation going on to form the tree view (say) and display the UI. (But we can see, of course).

kinow commented 5 years ago

Another issue that we may have is with the progress percent calculation in the Tree View and Graph View.

I don't think so - that's a trivial bit of maths for a few nodes, it is surely nothing compared to all the computation going on to form the tree view (say) and display the UI. (But we can see, of course).

Added that as it was raised as an issue in Riot. But I agree. I also don't expect any major impact on adding the progress (unless we have some bug in the code).

kinow commented 5 years ago

Learned how to look at the memory heap as I used to do in Java. Easier to spot the memory leak this way 👀

image

i.e. the handsaw pattern is not uniform, as it should be (for workflow five, which keeps a pretty constant number of nodes on the UI)

kinow commented 4 years ago

From Riot chat with @hjoliver (web gui room), quoting:

In the complex suite definition, replace this:

[scheduling]
    [[queues]]
        [[[ENSEMBLELIMIT]]]
            limit = 3
            members = ENSEMBLE_FORECAST_SHRT
        [[[ENSEMBLEOBSLIMIT]]]
            limit = 20
            members = ENSEMBLE_OBS

with this:

[scheduling]
    [[queues]]
        [[[default]]]
            limit = 5

(Much of the reason for very high machine load running the complex suite was: too many task jobs running at once: the original queues only place limits on a subset of all the tasks - members of the two families listed above, and one of the limits - 20 - is high for locally hosted jobs).

kinow commented 4 years ago

These past two/three days I've been spending some time running more diagnostic tools, and learning more about Vue reactivity and performance in components.

I've got a branch with some good results. Before:

image

After:

image

Functional components

When trying to add slots to the Tree component, I added a note to read more about Vue functional components. Before one would have to use jsx and the render function for functional components, but I think since 2.6, we are able to use <template functional> and create a single file component almost the same as before.

The difference being that instead of having an instance of Vue for each component, monitoring data, with methods, event handlers, etc. The component itself is - essentially - a function. A good example of a component that could be a functional one, is the Job.

By modifying it a bit, I managed to use only props. props are supported in functional components (data is not). That results in less objects created, less observers, and less memory and load time. Comparing the two images above, it should be possible to note that there is no Job (or Task) components. They've been evaluated and rendered. Adding objects to the Virtual DOM (and later translated to the DOM).

These funcitonal Job components are under the TreeItem. Once a TreeItem is updated - because the data changed - it will destroy its children components and the DOM values, then recreate. With the only difference now, that there is no Job component to worry about lifecycle in Vue.

TreeItem cannot be functional

My first test was to make the TreeItem functional. And it gave excellent results! It was the fastest I saw the tree view load and render. Even the complex workflow, that used to use a lot of memory (sometimes near 1GB, but normally above 400MB) was now well under 300 MB with ~ 1200 tasks until it hit the part where it has more tasks (2500 to 3000 tasks) and went up to 400, then 600 MB).

Capture

However, there is one issue with the TreeItem. It wouldn't collapse or expand. Took me a while to realize why that was happening.

The expand/collapse is a state of the TreeItem component. But a functional component is stateless.

I tried moving the state to the node (from the created workflowTree). But even that didn't work. Because that's an object from the Vuex store, so updating it doesn't really re-trigger the functional component. The only way is if the parent component is updated.

But for the Tree component, that would mean updating every TreeItem recursively too.

Conclusion

For now functional components should reduce memory and load time a bit for the application. I will prepare some PR's after doing a better comparison of before/after for just the Job/Task components.

Other things that I found out while doing this experiment this week:

There was one interesting issue in Vue repository where a user had a similar issue to ours. Trying to render tables with thousands of elements. There were several comments, but this one by the Vue creator/maintainer could perhaps be applied to the complex workflow:

Rethink your UI design. Do you really need 18,000 cells all rendered at the same time? Does it even make sense? How about pagination?

Using the tooltip of the GScan, I noticed at one point I had some tens of tasks in other states (running/waiting/whatever) and over 1500 tasks/jobs in the successful state, spread over families, cycle points, etc.

Perhaps we could later modify the tree when we have such large data set? Maybe things like:

Anyway, now at least I can create a few issues under this placeholder that may help the performance on the Tree component, but that may be helpful for other components like the Graph too 👍

kinow commented 4 years ago

Wiki with notes for troubleshooting performance: https://github.com/cylc/cylc-ui/wiki

kinow commented 3 years ago

I think this can be closed now. Our UI performance appears to be OK. Will work on new issues as required :+1: