eclipse-archived / smarthome

Eclipse SmartHomeβ„’ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
861 stars 784 forks source link

Improve BaseThingHandler.isLinked(..) performance #6416

Closed wborn closed 5 years ago

wborn commented 5 years ago

ThingHandler implementations often only update the state of linked channels for performance reasons. When an update is received they iterate over over all Thing channels, check if the channel isLinked(..) and then call updateState(..).

The isLinked(..) operation is expensive because it results in a call to ItemChannelLinkRegistry.getLinks(..) which will always iterate over all channel links:

https://github.com/eclipse/smarthome/blob/dff198a6a11f882f684778c0d34ad75c9d2bf4fd/bundles/core/org.eclipse.smarthome.core.thing/src/main/java/org/eclipse/smarthome/core/thing/link/AbstractLinkRegistry.java#L78-L86

So if you have 500 linked items and 1 Nest Thermostat (25 channels), one Nest update would cause the NestThermostatHandler to compare 25*500=12500 link UIDs each update.

To workaround this issue, bindings cache the linked channels in their ThingHandlers (https://github.com/openhab/openhab2-addons/pull/3695, https://github.com/openhab/openhab2-addons/pull/4159).

It would simplify ThingHandler code if these optimizations were not necessary. Also all other bindings would benefit from the increased performance.

cweitkamp commented 5 years ago

Where do you think is the right place for keeping track of the linked channels? Inside the BaseThingHandler (like implemented in the two bindings) or inside the Thing itself?

wborn commented 5 years ago

Things are currently unaware of their links so I think it is better to keep it that way. The BaseThingHandler could be an option.

A more generic approach could be to introduce some caching in the AbstractRegistry since all elements implement Identifiable:

public abstract class AbstractRegistry<E extends Identifiable<K>, K, P extends Provider<E>>
        implements ProviderChangeListener<E>, Registry<E, K> {

Though that may cause memory issues if there are a lot elements. Also I can imagine one day the optimization could depend on the StorageService being used, e.g. using some (database) index.

maggu2810 commented 5 years ago

ThingHandler implementations often only update the state of linked channels for performance reasons. When an update is received they iterate over over all Thing channels, check if the channel isLinked(..) and then call updateState(..).

Is this something that is suggested on some places? Should a thing handler access the link registry? Shouldn't the thing handler update the state regardless if it is linked all the time?

If there is some bottleneck if a non linked (to an item) channel is updated, perhaps this should be fixed in the framework.

ssalonen commented 5 years ago

Good work @wborn identifying the root cause for slow performance with isLinked

I have also identified that BaseThingHandler.updateState is suffering from serious performance issues as well. Turns out it also boils down to the similar component (ItemChannelLinkRegistry).

Decided to mention this issue here as well since it touches the same software component and perhaps the solutions to both performance issues are correlated as well.

Please let me know if you would like me to create new issue about this.

Would different kind of data structure serve the usage patterns better?

Analysis of updateState

BaseThingHandler.updateState

calls

ThingHandlerCallback anonymous class implementation in ThingManagerImpl

        @Override
        public void stateUpdated(ChannelUID channelUID, State state) {
            communicationManager.stateUpdated(channelUID, state);
        }

calls CommunicationManager.stateUpdated

    public void stateUpdated(ChannelUID channelUID, State state) {
        final Thing thing = getThing(channelUID.getThingUID());

        handleCallFromHandler(channelUID, thing, profile -> {
            if (profile instanceof StateProfile) {
                ((StateProfile) profile).onStateUpdateFromHandler(state);
            }
        });
    }

calls CommunicationManager.handleCallFromHandler

    void handleCallFromHandler(ChannelUID channelUID, @Nullable Thing thing, Consumer<Profile> action) {
        itemChannelLinkRegistry.stream().filter(link -> {
            // all links for the channel
            return link.getLinkedUID().equals(channelUID);
        }).forEach(link -> {
            Item item = getItem(link.getItemName());
            if (item != null) {
                Profile profile = getProfile(link, item, thing);
                action.accept(profile);
            }
        });
    }

which results in iterating over all channel links.

wborn commented 5 years ago

Is this something that is suggested on some places? ... If there is some bottleneck if a non linked (to an item) channel is updated, perhaps this should be fixed in the framework.

It's not suggested but it is used in a lot of bindings. Probably to workaround the issue outlined by @ssalonen (:+1:). After analyzing performance with JProfiler I also saw that state updates are relatively expensive.

Shouldn't the thing handler update the state regardless if it is linked all the time?

I'd be all for that but it currently may result in performance issues when ThingHandlers are processing a lot of updates.

Should a thing handler access the link registry?

The isLinked(..) methods are also used to determine what information should be polled from devices. E.g. to reduce network traffic on Zigbee/Z-Wave/WiFi networks. As suggested the BaseThingHandler could also keep track of this information itself based on the channelLinked, channelUnlinked calls. That way it wouldn't have to access the link registry.

Hilbrand commented 5 years ago

Here are some thoughts: In AbstractRegistry the elementMap is created as: Map<Provider<E>, Collection<E>>. However the value of the map is initialized with a Set: CopyOnWriteArraySet. This means it could also have been a Set instead of a Collection. This is important because that means the values are unique. And this means instead of a simple map it could be implemented in some kind of linked map that also keeps the relation between E and it's provider. While technically E doesn't have to be related to only 1 provider, lots the code seems to assume it does (I'm not too familiar with the code to know if it's actually does have to be related to to 1 provider). For example it's iterating through the providers and then per provider through it's Collection of E's and simply return the first and only that match. Since it looks like a lot of use cases use the getAll() (and stream() (the one on the Registry interface)) to determine if E is there somewhere that could then be replaced by a simple get in a map instead of iterating through all the values. Does that make sense?

kaikreuzer commented 5 years ago

Thanks guys, it seems you discovered a great potential to make the core framework perform much faster 😎!

Where do you think is the right place for keeping track of the linked channels? Inside the BaseThingHandler (like implemented in the two bindings) or inside the Thing itself?

@cweitkamp You already did a lot of the refactoring to get code away from the BaseThingHandler and into the callback, so I am surprised that you do not suggest the ThingHandlerCallback for being the right place!

For the ThingHandlerCallback.isChannelLinked() method, we could probably introduce a small cache as a simple measure - we only need to remember booleans, so the memory overhead should be acceptable.

But @ssalonen has a good point, the underlying issue is that the ItemChannelLinkRegistry is slow and it is used in other places as well, so we should see how we can optimise it.

@Hilbrand Sounds interesting and it using bidrectional map could indeed speed up the get() implementation. But afaics, this would not help us for the ItemChannelLinkRegistry as we are filtering all links for some of their properties (i.e. which item they are linked to), so we do not have any other choice then browsing through all of them.

I think what would need to be done is to do a performant implementation of the getBoundChannelsand getLinkedItemNames methods of ItemChannelLinkRegistry (either by internally improving the architecture or by adding some caching here). When those are in place, most code (e.g. the one in the CommunicationManager) can be changed to use these methods and should not have any performance issues anymore.

maggu2810 commented 5 years ago

I did not test it yet, just tried to formalize them thoughts. Can you have a look at: https://github.com/eclipse/smarthome/compare/master...maggu2810:link-registry?expand=1

Hilbrand commented 5 years ago

@kaikreuzer It doesn't help directly with the linked except for if the getAll() method could be made more efficient. But with the changes suggested by @maggu2810 that method is not used anymore in those cases. In general the getAll() seems very inefficient. Both when used to find an element and in memory usage (creating a new list every call). I was working on some improvements to the AbstractRegistry to prepare the code to have such a bidirectional map (no map yet though). As a first step I've internalized fields to make it possible to use a different implementation. This could be a step stone to change usages of getAll: See https://github.com/eclipse/smarthome/compare/master...Hilbrand:abstract_reg?expand=1 Let me know if this is the right step and i'll continue work on it.

maggu2810 commented 5 years ago

@Hilbrand @wborn @ssalonen I added a PR to make the comments more visible. Please have a look at #6427

maggu2810 commented 5 years ago
wborn commented 5 years ago

I'll close this issue now that all improvements have been merged and released as ESH stable. Thanks for the quick and thorough resolution everyone!

ssalonen commented 5 years ago

Tried out today with openHAB-2.4.0 Snapshot

openhab> bundle:list -s |grep -i SmartHome
 98 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.core
 99 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.discovery
100 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.discovery.mdns
101 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.dispatch
102 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.xml
103 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core
104 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.audio
105 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.binding.xml
106 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.id
107 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.persistence
108 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.scheduler
109 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.semantics
110 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.thing
111 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.thing.xml
112 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.transform
113 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.core.voice
114 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.console
115 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.console.karaf
116 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.http
117 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.javasound
118 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.monitor
119 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.net
120 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest
121 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.core
122 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.mdns
123 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.optimize
124 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.sitemap
125 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.sse
126 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.rest.voice
127 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.transport.mdns
128 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.webaudio
129 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.core
130 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.item
131 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.item.ide
132 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.item.runtime
133 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.lsp
134 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.persistence
135 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.persistence.ide
136 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.persistence.runtime
137 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.rule
138 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.rule.ide
139 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.rule.runtime
140 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.script
141 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.script.ide
142 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.script.runtime
143 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.sitemap
144 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.sitemap.ide
145 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.sitemap.runtime
146 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.thing
147 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.thing.ide
148 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.model.thing.runtime
149 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.storage.json
150 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui
151 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui.icon
152 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui.iconset.classic
214 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.exec
215 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.javascript
216 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.jsonpath
217 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.map
218 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.regex
219 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.scale
220 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.xpath
221 β”‚ Active   β”‚  75 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.transform.xslt
222 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui.basic
223 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui.classic
224 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.ui.paper
236 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.discovery.usbserial
237 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.discovery.usbserial.linuxsysfs
238 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.config.serial
239 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.transport.serial
240 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.transport.serial.rxtx
241 β”‚ Active   β”‚  80 β”‚ 0.10.0.201811171951    β”‚ org.eclipse.smarthome.io.transport.serial.rxtx.rfc2217

I am still seeing high CPU usage with updateState and updateStatus... I wonder if the fix is present with openHAB 2.4.0-SNAPSHOT?

With Raspberry PI 3 Model B, I have the following result from profiling CPU usage

image

maggu2810 commented 5 years ago

I wonder if the fix is present with openHAB 2.4.0-SNAPSHOT?

There is a Wiki entry that contains the information how to check. See: https://github.com/eclipse/smarthome/wiki/ESH-used-in-openHAB

@kaikreuzer If "0.10.0.201811171951" is equal to "#316 (Nov 17, 2018 2:50:16 PM)" should we add a comment about the timezone differences (add / sub 5 hours)?

maggu2810 commented 5 years ago

@ssalonen

The update state of the base thing handler looks like:

    protected void updateState(ChannelUID channelUID, State state) {
        synchronized (this) {
            if (this.callback != null) {
                this.callback.stateUpdated(channelUID, state);
            } else {
                logger.warn(
                        "Handler {} of thing {} tried updating channel {} although the handler was already disposed.",
                        this.getClass().getSimpleName(), channelUID.getThingUID(), channelUID.getId());
            }
        }
    }

If the state is spend in updateState of the BaseThingHandler and not in the anonymous ThingHandlerCallback implementation of the ThingManagerImpl, I assume the "self time" you identify is caused because of the synchronized statement. The method can continue only if it can fetch the synchronization lock and if it waits for it, some other logic hold its.

You could try to identify "who" is holding the locks but I don't think we could remove the synchronization without running into some other problems (without having a further look if there are some improvements possible -- perhaps we could implement some further improvements).

Hilbrand commented 5 years ago

Interestingly the synchronized part seem to be introduced in 929002ed7ce44b215e6edfdeaf1ab64bb978f4b9. It seems to be intended to handle nulls on the callback, not necessarily to make the whole method synchronized. I wonder what the effect would be if all this synchronizing is simply replaced by usage of something like Optional.ifPresent? But that doesn't help explaining why it's locked :slightly_smiling_face: Without some context that would indeed be hard to figure out. I would suspect it to be a something happening in some bindings. So what kind of bindings were used in this profiling?

maggu2810 commented 5 years ago

Optional is immutable, so the value of an optional can be accessed thread safe. But if the optional is a member and changed without synchronization / volatile keyword the access to the member (of type optional not the member of the optional) is not synchronized.

If the only intention has been to handle the "current" callback with respect to null correcty, the callback itself could be marked volatile or we could use an AtomicReference.

Hilbrand commented 5 years ago

I was thinking about something like the following:

protected void updateState(ChannelUID channelUID, State state) {
        this.callback.ifPresentOrElse(c -> c.stateUpdated(channelUID, state)
                            () -> logger.warn(
                        "Handler {} of thing {} tried updating channel {} although the handler was already disposed.",
                        this.getClass().getSimpleName(), channelUID.getThingUID(), channelUID.getId());
            );
    }

But that only works from Java 9 :disappointed: The else isn't possible in the current Optimal in one call, so that means it doesn't add value to use Optional then.

So I guess it would mean to always just assign the callback to a local variable and then the synchronized can be removed. Not sure if this would require an AtomicReference. It's mainly just assigned without conditions. But volatile might be better added I presume.

ssalonen commented 5 years ago

The profiling was based on

Default filters were used with jprofiler. It seems that they might affect how the aggregates are done...

Configuration

The things point to modbus server on another machine on LAN (tykkikone.localdomain port 5502). For that purpose I was using diagslave on a fedora pc (https://www.modbusdriver.com/diagslave.html), diagslave -p 5502

I expect you would get similar results with a much more simple binding, with a thing updating linked channels very frequently.

Hilbrand commented 5 years ago

Given that:

  1. The waiting time does indeed come from blocking on blocking synchronized code.
  2. The methods usage by how the a binding would implement the usage.
  3. Triggering such a issue would require multiple threads within a single binding.

Therefor I don't expect it to be a problem with any binding, but very much with specific bindings. For example you mention the Modbus binding. I'm not familiar with this binding, but it contains several several threadings, and theoretically could contain such blocking effects.

I think these kind of blocking issues can be caused when a binding is triggered by status change of the binding and that triggers notification change update in the binding itself (this is within a synchronized part) and then the binding start up a new thread to do something that also calls a synchronized method in the base class. That method has to wait till the notification is done. These are tricky things to find. I once had programmed a deadlock in my binding in a similar pattern.

ssalonen commented 5 years ago

Very relevant observatorions and thanks for correcting my sloppy statement. I am not even sure about the first assumption, I did get different profiling results by adjusting jprofiler filters.

I will report if I get the time to reproduce this with more simple setup.