eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
861 stars 787 forks source link

openHAB startup loop #4726

Open mstormi opened 6 years ago

mstormi commented 6 years ago

I just upgraded from OH snapshot 1119 to latest 1132. On starting up, OH kept initializing over and over. It kept processing the .rules, .items and .persist files again and again and again. I've uploaded my full openhab.log as created during that startup. It's probably best to check that right away. I didn't notice anything special in there but then again, I've no idea what to look for. There's a large number of log entries regarding rules called "Init virtual Items". There's one such rule in most rules files and all of these trigger on "System started" but they don't do anything fancy except to initialize a couple of items and variables.

I stopped that initialization cycle, deleted cache and tmp dir and moved all of my rules files away, then restarted. After a few minutes OH has successfully started/initialized and I moved the rules files back in, now its working.

I have been playing that move-away-back-in-rules game on every startup in order to get OH up and running for many weeks and since many snapshot generations now. Not sure if ESH #1896 is related, but I'm anxiously waiting for someone to take on that one.

openhab.log

kaikreuzer commented 6 years ago

You seem to have addons in your addons folder - there seems to be a problem with this in openHAB at the moment as you can see on these reports.

So to rule out any such openHAB/Karaf-specific side effects, I would suggest to remove those to simplify the analysis.

So could you give a short and precise description of the issue, please? Is it "openHAB keeps executing startup rules in a loop"? What is the minimal setup to make this happen?

mstormi commented 6 years ago

Well cannot easily play around too much, it's my house :-) and I need those addons for it to work, too (yes, 3 addons. It's jars no kars). Unfortunately, I cannot reliably reproduce the issue. On next restart, startup worked out fine. Didn't even have to move the rules away. I'll try removing addons, rules and items one after the other if I happen to run into that loop situation again to find out what's the step to break the loop. But for now, all I have is this log.

I think the startup rules being executed over and over is just a symptom and consequence of the rules files being loaded, the interesting question is why are the rules files loaded again and again. I renamed the issue.

mstormi commented 6 years ago

It happened again when upgrading to #1133. I don't think it's related to this felix addon issue. There's no hint why this behavior is supposed to be related to addons. I had temporarily removed them, too, but that made no difference.

I believe in some sort of circular or way-too-deep-nested reference or dependency of files on each other. Please note that for each items file to be loaded, there's lines 'Refreshing resource xxx.rules' for EVERY rules file. Is that normal ?? Rules hadn't been loaded at that stage yet. Can't believe it is. Also note the various 'Removing file xxx from repo' entries. All rules files get loaded (again?) after items are processed.

I believe this issue just hasn't surfaced elsewhere just because I have quite many and large .items files. People to just use the UI probably won't see this issue because all their things/items are loaded in one go from jsondb. Note that while I likely have more files than others and coincidence seems to play its part (the order of files being loaded/processed), the only reliable way to make startup succeed for me was and is to remove all rules files and to move them back after startup has succeeded. That's why I strongly suggest you have a look at old ESH #1896. It's about enforcing a startup order of files so rules could be forced to get loaded last. Manually ensuring that has always solved (or avoided) this startup problem of mine in the past. And it makes sense to implement that load order anyway, doesn't it?

Here's my latest openhab.log. I've greatly reduced the number of 'Init' rules (trigger on 'System started') BEFORE this upgrade - just two to remain - but it seems that has not made a difference.

openhab.log

maggu2810 commented 6 years ago

Have you already tried to use one item file only? So merging all your item files to a big one? As you should not loose any functionality can you give it a try?

Well cannot easily play around too much, it's my house

I understand that playing around with your house setup in "productive" is a bad thing. Can you create a setup, so we can reproduce this behavior in the Eclipse IDE without any openHAB specific stuff?

mstormi commented 6 years ago

Didn't try so far, I tried right now to combine all items into one file, plus upgraded to # 1140. Haven't verified everything yet but I'd call it a partial success. Seems that has broken the load loop, it has come to an end after about 30 minutes. Still, that's about twice as much as startup takes when I move rules out first and move them back when load drops from 100% (= when initial processing of config has finished) which usually takes 6-7 minutes.

Can you create a setup, so we can reproduce this behavior in the Eclipse IDE without any openHAB specific stuff?

How am I supposed to do that ? I don't know what item or rule or combination thereof is triggering the faulty behavior. All I can think of that either a) i provide you with my complete openHAB config and you check if you can reproduce the behavior in IDE/on your systems or b) we generate a number of artificial items files and a number of rules files with the rules referencing the items. I could do that (although you can probably better and easily do yourself), but that's a trial-and-error approach then, it's like looking for the needle in the haystack, it's likely we're wasting our workforce there.

mstormi commented 6 years ago

After latest upgrade, I ended up with my jsondb shrunk to zero. I had to manually add all things (zwave controller) or confirm them in PaperUI Inbox. Now while that's another issue we would need to look after, this is interesting: On confirming one item (FWIW: a hue bridge ), I noticed in log that all rules were processed as below. So again my question as put up above already: is that normal ? Is it intentional to check through every rules file whenever just even a single item is changed ? I feel there's too many dependencies built-in between items and rules, and these ultimately result in my startup loop.

2017-12-17 10:34:54.528 [DEBUG] [el.core.internal.ModelRepositoryImpl] - Refreshing resource 'Szenen.rules' 2017-12-17 10:34:55.019 [DEBUG] [el.core.internal.ModelRepositoryImpl] - Refreshing resource 'Test.rules' 2017-12-17 10:34:55.234 [DEBUG] [el.core.internal.ModelRepositoryImpl] - Refreshing resource 'Treppe.rules' 2017-12-17 10:34:55.446 [DEBUG] [el.core.internal.ModelRepositoryImpl] - Refreshing resource 'WC.rules'

maggu2810 commented 6 years ago

@mstormi I am not using the DSL files myself, but doesn't it depend if rules are not proceeded because items are missing, so they are retriggered if a new item is added? Can you have a look at the resp. sources?

mstormi commented 6 years ago

Hmm yes you're likely right. As I noticed after writing my comment, my jsondb was completely empty (whyever - that's a different issue). But once we 'zoom out' a bit, it's all about this triangle, i.e. the dependencies of each other and load/processing order of things, items and rules. For most users, things are from jsondb, some define them in DSL files and UI. items, many define them in DSL files, some in UI, some both. rules are always DSL. Seems if we're lucky to incidently have Karaf process them in the 'right' order (things (jsondb + DSL) - items (jsondb + DSL) - rules), everything's fine and startup proceeds as fast as files can be processed. No avoidable delays due to re-processing. But if Karaf decides to have a different order, this can result in large delays / lots of re-processing or even lockup.

Allow me to once more point you at ESH#1896 which is about enforcing a deterministic processing order. Wouldn't it make sense, too, to finally implement that one ? My guess is that once implemented, we won't see my issue any more. And it's probably even easier and way more promising to 'just' implement that instead of trying to nail down the exact circumstances of this startup sequence failure by means of 'trial and error'.

maggu2810 commented 6 years ago

Wouldn't it make sense, too, to finally implement that one ?

Sure, I assume we would all be happy if this will be implemented.

mstormi commented 6 years ago

Ist Simon Kaufmann nicht dein Kollege und derjenige der das könnte ? Oder auf wen warten wir da alle ?

maggu2810 commented 6 years ago

Hm, I have to read the whole topic again, but at the moment I didn't know why sjka should be responsible to implement that feature. I assume everyone could came up with a PR that solves the topic.

mstormi commented 6 years ago

Well for sure he's not responsible but he's one of just very few people to have the required insight and to be capable of doing it.

triller-telekom commented 6 years ago

As @maggu2810 wrote in https://github.com/eclipse/smarthome/issues/4726#issuecomment-352244038 this issue is about trying to execute rules which depend on items which are not initialized yet.

I am pretty sure this will be covered by #1896. So let's make this issue depend on #1896 and close it one this has been implemented.