ControlSystemStudio / cs-studio

Control System Studio is an Eclipse-based collections of tools to monitor and operate large scale control systems, such as the ones in the accelerator community.
https://controlsystemstudio.org/
Eclipse Public License 1.0
113 stars 96 forks source link

PVManager: max update rate has large impact on CPU usage #84

Open xihui opened 11 years ago

xihui commented 11 years ago

My test case is based on the BOY performance test opi and softIOC. It has 300 PVs update at 10 hz and 700 PVs update at 1Hz. The OPI has 1000 text updates widgets.

Here is my test result (The CPU usage is a personal round average value using Windows Task Manager):

Max Update Rate PVManager CPU (with CSS Minimized) Utility PV CPU ( with CSS Minimized) maxRate(ofMillis(10)) 42%(35%) 18% (3%) maxRate(ofMillis(50)) 25%(10%) 17% (3%) maxRate(ofMillis(100)) 22%(8%) 17% (3%) maxRate(ofMillis(500)) 8% (4%) 6% (2%)

In my opinion, if the max update rate is faster than PV's update rate, it should not use too much extra CPU.

Here is a screenshot of JProfiler when it is using PVManager:

image

kasemir commented 11 years ago

Can you tell if you'er actually receiving notifications, so the maxRate() acts like a minRate()? Or is the CPU usage somewhere inside PVManager, not resulting in events that you receive?

Could it be that the channel names are somehow handled as formulas, and formulas will always update at the "max" rate, regardless of value changes triggered by the underlying PVs?

On Wed, Jul 10, 2013 at 11:12 AM, Xihui Chen notifications@github.comwrote:

My test case is based on the BOY performance test opi and softIOC. It has 300 PVs update at 10 hz and 700 PVs update at 1Hz. The OPI has 1000 text updates widgets.

Here is my test result (The CPU usage is a personal round average value using Windows Task Manager):

Max Update Rate PVManager CPU (with CSS Minimized) Utility PV CPU ( with CSS Minimized) maxRate(ofMillis(10)) 42%(35%) 18% (3%) maxRate(ofMillis(50)) 25%(10%) 17% (3%) maxRate(ofMillis(100)) 22%(8%) 17% (3%) maxRate(ofMillis(500)) 8% (4%) 6% (2%)

In my opinion, if the max update rate is faster than PV's update rate, it should not use too much extra CPU.

Here is a screenshot of JProfiler when it is using PVManager:

[image: image]https://f.cloud.github.com/assets/1466487/775540/142ec89a-e973-11e2-9a14-c6eb166fc682.png

— Reply to this email directly or view it on GitHubhttps://github.com/ControlSystemStudio/cs-studio/issues/84 .

gcarcassi commented 11 years ago

If you dig in past e-mails, we already discussed this. I already know what’s going on. It can be optimized. It’s not a trivial optimization. And right now it’s not so bad that is not usable. The focus right now is fix things that are broken.

http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

Gabriele

xihui commented 11 years ago

Can you tell if you'er actually receiving notifications, so the maxRate() acts like a minRate()? Or is the CPU usage somewhere inside PVManager, not resulting in events that you receive?

No, the PVReaderListener doesn't receive notification at 100Hz. It is only notified on value change.

Could it be that the channel names are somehow handled as formulas, and formulas will always update at the "max" rate, regardless of value changes triggered by the underlying PVs?

No, it is not a formula. The expression is PVManager.read(channel(name, VType.class, VType.class)).maxRate(ofMillis(maxUpdateRate)); Even it is a formula such as ='pvk469'+'pvk122', it only updates on either one of the PVs' value change.

Yes, this topic has been discussed before. I list it here again for tracking purpose.

Thanks, Xihui

kasemir commented 11 years ago

Another example for the side-effect of maxRate():

In this example, user can select a row in a VTable widget, and other widgets then display =columnOf('selection-pv', "column"). Initially, the user has not selected anything, so the 'selection-pv' is undefined/null or 0, depending on how the initialization of local PVs is configured. The formula is now evaluated at the maxRate(), so the console logs over and over "Can't find a match ...". Better would be to only evaluate the formula when inputs change.

screen shot 2013-10-18 at 12 02 04 pm

berryma4 commented 11 years ago

Moving to 3.2.12

berryma4 commented 10 years ago

Considering the issues are not seen with pvmanager alone, but are with this implementation. This seems to be a problem that doesn't qualify as a "hotfix". Is this keeping you from deploying 3.2.x? If not, I say we focus on solving the issue on 3.3.x.

kasemir commented 10 years ago

Hi Eric:

I don't understand your comment "not seen with pvmanager alone, but are with this implementation."

It's a fundamental problem in PVManager, seen in every tool that uses it. In the bug report, we list BOY scripts with any type of input, probe that uses =columnOf(table-widget-selection-pv, ..). But anything that uses the PVManager suffers from the fact that PVManager will internally run some threads at the 'max' update rate, even if the PVs don't change at all, and the way it uses locking results in CPU load. For formulas, it actually triggers a formula update even though the formula inputs nor result did change.

We can move that ticket to any milestone. My main point is that we should not forget it, and not create new tickets which basically describe the same issue.

Thanks, Kay

gcarcassi commented 10 years ago

This is what Eric means: cs-studio-load A boy Screen with about 100 fields, each with one rule. Load is 33% pvmanager-200-load This is pvmanager by itself, scanning 200 pvs. Load is 2.3%. Since the background scanning is present in both cases, saying that pvamanger is the cause of the 33% load does not make sense. pvmanager-2000-load This is pvmanager with 2000 pvs. Load is 18%. So, yes in theory it can be a problem. But we are NOT hitting that. If you do profiling with BOY, you see that time is spent in the Executor implementation, well BEFORE anything in pvmanager is being hit. And it's lock contention. So: yes implementing passive scanning would alleviate the symptom (less lock contention triggered) but not the problem (lock contention). In effect: you'd just be hiding it.

The exceptions is another problem, that has nothing to do with passive/active scanning. Taking all exceptions and sticking them in the log is not a good idea because in most cases these do not represent a real problem. They can represent normal conditions such as: timeout, requested type and found type do not match, divide by 0 in a formula, etc... Also: avoiding duplicate exceptions is something that is not solved by passive scanning. You are right that, if you got a divide by 0, probably getting all updates over and over is not that useful. But that can happen in a formula when things are changing anyway. In both cases (load from BOY and multiple exception) the proper solution is not implementing passive scanning.

I'd suggest you break this ticket into 3:

kasemir commented 10 years ago

Breaking this into separate issues sounds great.

The 'passive scanning' seems to be a somewhat obvious: If PVManager is connected to several PVs which don't change, there should be insignificant CPU load, and formulas that have those PVs as input won't send updates.

To me, that fixes this specific exception duplication. "=columnOf(…)" would send one error for the initially undefined input PV, instead of duplicating that at maxRate.

For high load in BOY, I am not sure if it can be separated. Tests of PVManager on its own have low CPU load. BOY with utility.pv layer has low CPU load. Only BOY with PVManager has the high CPU load. You may be correct that org.csstudio.simplepv.pvmanager.PVManagerPV could be updated to elevate the problem.

So how about Ticket: Feature request for 'passive scanning' Ticket: Review org.csstudio.simplepv.pvmanager.PVManagerPV if it can be more efficient ?

kasemir commented 10 years ago

diirt/diirt#2 would provide the Passive Scanning