Closed TimGuiteDiamond closed 1 year ago
The Profiler component has indeed proven successful and useful.
It is amuch easier way of extracting the information we are interested in and should allow us to perform useful comparisons on branch changes in the future. Although the method I lay out below is manual, it has the potential to be automated much more in the future.
We know from our previous measurements that the master
branch is currently quite good at rendering many readbacks which are all looking at a single PV, and that without throttling, performance takes a big hit when looking at many different PVs.
To verify the usefulness of the Profiler component, I performed the same experiments again, measuring 3 statistics available through the onRender
callback:
I initially set up the Profiler component around the top level DynamicPage widget, which holds the performance page and all of the Readbacks when they are shown. For our master
branch, a simple use of the onRender
callback to provide comma separated outputs sufficed. I tested the performance page with various numbers of Readbacks, all looking at sim://ramp#00
.
Once I began testing Readbacks looking at different PVs though, a measurement issue arose. From previous investigations, it was seen that without throttling, many reconciliations were being performed. These reconciliations are obvious to the developer as being related but the Profiler component does not have this knowledge. Therefore, all timings within a window of half the simulation trigger were aggregated. This is not ideal, as it does not account for the gaps in time between reconciliations, which could be a cause of visible delay for the user. Nevertheless, it seems to function relatively well. A more sophisticated system could be used in the future.
Again, various numbers of Readbacks were tested, now looking at their own PVs (sim://ramp#00, sim://ramp#01
etc). I then merged the performance branch with our simple throttling implementation and repeated the experiment.
Here are the graphs showing how actualDuration, baseDuration and Reconciliation Time changed with widget number:
Interestingly, the actual duration does not seem to change much between the three tests. I understand this to mean that once React has performed its reconciliation, there is not a significant difference between the three in the time taken to write this to the real DOM. This would make sense as they are painting the same number of widgets and making very similar changes to the DOM.
The base duration shows a clear increase for the test without throttling, showing several orders of magnitude change by 100 widgets. This represents the worst-case render time; without throttling, React would potentially have to check the entire subtree for each new value update. So this is expected and useful information. As expected, the master and throttled versions are very similar as they only perform the reconciliation once.
Reconciliation time similarly shows the expected difference between the tests, although it is perhaps more representative of the delays which would be seen by the user than base duration. We can see clearly here that even our simple throttling tool performs well compared to the case where all readbacks are looking at the same PV.
For the throttled case, renders are sometime split across multiple reconciliations, depending on how quickly the store was updated and how the updates interacted with the throttling periods.
These times were aggregated in the Profiler function but are clearly visible to the user. The gaps between reconciliations are not properly accounted for in these measurements. This is an effect of how we have set up the tests, throttling and simulation. For a more direct comparison between the master and throttled branch, we should try to trigger all simulation updates at once, rather spreading it across many function calls. Then we would expect to see very similar performance metrics between master and throttled, both from the profiler and in Chrome Devtools.
This represents a big step in the right direction with regards to making it easier to measure the impact of our changes on performance. We should attempt to automate this further and investigate how we can integrate this with our CI pipeline, to provide some simple statistics at the least.
Further work could be done to see how different widgets, e.g ProgressBar, compare with the results we have shown here.
@willrogers I might move this to the performance Wiki or make a new one as I think it would be useful to have as reference.
Let me know if you have any questions or suggestions.
I will try and tidy up the performance branch a bit as I think it could be merged. Would also be interesting to compare the master branch with the changes you have made on rules
using this methodology.
Test setups can be found on the following branches:
measurement
measure-no-throttle
measure-throttle
The json pages were generated using wdm_performance_test
which can be found on our internal Gitlab.
Also applied this to the rules
branch on measure-rules
. This makes me confident we could merge the base measurement branch to master and have it work well with future development and changes.
I will open a PR and put more info there.
Raw data:
Based on performing the same experiment as above, comparing the master branch and the rules branch (#147) the changes do not have a significant impact on performance for the case where Readbacks are looking at the same PV.
No. of Readbacks | Master | Rules |
---|---|---|
1 | 1.3 | 2.2 |
5 | 2.2 | 4.8 |
25 | 8.2 | 10.2 |
50 | 13.5 | 14.1 |
100 | 18.5 | 19.0 |
Average taken over 10 readings as shown in above graph.
This suggests some minor overhead has been added but not a considerable amount and certainly nothing which has had an effect on the computational complexity.
Made a production build of the measurement
branch using npm run build -- --profile
to maintain profiler output.
Following results found for average of 10 readings:
No. of Readbacks | Master | Production |
---|---|---|
1 | 1.3 | 1.14 |
5 | 2.2 | 2.83 |
25 | 8.2 | 8.48 |
50 | 13.5 | 13.3 |
100 | 18.5 | 13.2 |
200 | N/A | 21.4 |
300 | N/A | 26.8* |
*These readings were also made at 10Hz
This demonstrates that the production build does provide a significant speedup, particularly with larger numbers of widgets.
Note that the --profile
option does add some overhead. With a production build with no profile flag, I was able to see 300 Readbacks updating at 10Hz without any noticeable lag. This is a really big improvement from where we were a few weeks ago and should not be overlooked. We should aim to at least match this level of performance moving forwards. As always, further tests are required to test for widgets looking at different PVs etc. Note that we are now also updating the tooltip and providing foreground colour based on the alarm signals! (More than DBWR :+1:)
Clsoing as not current but a useful reference
We have made some good headway regarding performance and measuring this at different levels.
However, the current methodology is manual and time consuming. It would be great to have an automated measurement tool of some kind. Ideally this would integrate with our CI platform to make it obvious to us when we have introduced significant performance slowdown into the application.
As a first step it would be useful to have a tool which measures renders at various levels and outputs this data in a useful format, to speed up the process of measuring multiple renders and extracting the render time, reconciliation time etc.
React Profiler looks like a useful tool and this article describes its use and benefit, as well as linking to a live project on Github which is using this profiling mechanism.