eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
865 stars 782 forks source link

JsonDB: Things and Links got missing after Hard Reset #3016

Closed philomatic closed 7 years ago

philomatic commented 7 years ago

In rare cases after a power failure / hard reset, all Things / Links (Items are declared by file) are missing. Inbox is filled with these missing Things (Z-Wave).

This happens to us rarely but repeatedly since the change to JsonDB. So, I guess this only happens when there are changes written to the DB during the power failure. However, following http://docs.openhab.org/administration/jsondb.html, in case of corrupt files (that "should" be the case) the last backup file should be loaded. But instead, all Json files are recreated new with empty content.

Unfortunately, I can't seem to force this error by let the DB write new entries and disconnecting power. One noticeable ERROR in the logs regarding persistence when shutting down OH2 correctly:

2017-02-01 18:32:11.437 [ERROR] [.eclipse.smarthome.model.persistence] - [org.eclipse.smarthome.model.persistence.manager(95)] The unsetModelRepository method has thrown an exception java.lang.NullPointerException

(see log attached) openhab.log

I remember a discussion for a similar problem with the hard reset of OH2 VMs, however I can't find it again :/

Best regards

maggu2810 commented 7 years ago

If you read articles like e.g. https://danluu.com/file-consistency/, https://lwn.net/Articles/457667/ and so on, you will see that reliable data storage is a very difficult topic. Database systems does a lot of stuff to ensure that data should be stored correctly.

So perhaps you could change your startup script to backup the userdata directory used by JsonDB before your SmartHome system is started, so you can identify how the files look like before they get recreated with empty content.

philomatic commented 7 years ago

Hi maggu,

yes, I'm totally with you. A crash-proof DB is a very difficult topic, there are whole OSs which deal with this.

And don't get me wrong here: I'm not saying that esh should be totally crash-safe, especially not in terms of data reliability for recent values. I think this would be over-engineered. Nor I'm saying that JsonDB is more problematic or worse than the old MapDB implementation (in fact, I prefer Json).

However, since we stated in the above mentioned discussion (which I unfortunately can't seem to find; @kaikreuzer can you remember this?) that this shouldn't happen AND we have a backup folder which should get used, this is a bug for me.

I'll modify our startup script and repost when I have more information on this.

kaikreuzer commented 7 years ago

If I read this https://github.com/eclipse/smarthome/pull/1868#issuecomment-233144376 correctly, an automatic integrity check and reverting to a backup is NOT yet implemented; so it is up to the user to notice that the db is broken and to replace if by one of the backups. I'd agree that it would be nice to have this as an automatic mechanism (with some red warnings in the log) in place. @cdjackson What do you think?

maggu2810 commented 7 years ago

@philomatic I didn't get you wrong (I think so). If ESH provide a storage implementation IMHO it should be as good / reliable as possible (with respect to the effort). Its not about blaming an implementation and the work someone has been done (in this case, I assume we are all happy about the JsonDB at all). We all want a good and working solution. Didn't we? So, if there is anything to improve, feel free to create issues and (perhaps better) create a PR.

cdjackson commented 7 years ago

If I read this #1868 (comment) correctly, an automatic integrity check and reverting to a backup is NOT yet implemented

I'm pretty sure it is implemented. I guess the commits were squashed so we don't see changes after this comment, but it retains 5 backups, and during loading of the json files, if there's an error, it will try the backups. Maybe this is broken of course, but I'm reasonably sure it was tested.

with some red warnings in the log

Logged messages should be -:

logger.info("Json storage file at '{}' was not read.", file.getAbsolutePath());
logger.info("Json storage file at '{}' is used (backup {}).", backupFile.getAbsolutePath(), cnt);

In my system at least, changes to the json files happen very irregularly - ie only when you make changes to the thing/item/link/... configuration. For me, that's not often, so the window for corruption due to writing at the same time as a power failure should be really small.

Of course, just saying the window of opportunity is low isn't all that nice if it's happening (even if it is surprising) - I agree that the system should be robust so if people have ideas for improvement I'm happy to look at it.

It's just not clear to me how we could improve to 100% reliability - even if the backup system works, it still means that some changes to configuration are lost if you have to revert to the previous file. My assessment is that this is better than having to start again (obviously!) but if power cuts are a regular issue then personally I would also suggest to solve this.

philomatic commented 7 years ago

In my system at least, changes to the json files happen very irregularly - ie only when you make changes to the thing/item/link/... configuration. For me, that's not often, so the window for corruption due to writing at the same time as a power failure should be really small.

Hmm... interesting, actually I also do not change the configuration on a regular basis, but sometimes I have 5 different backups files created for e.g. Thing.json within 10 seconds from first to last. If I understood correctly (as the backup files are only created when written changes occur), this means configuration changed in about 10 seconds 5 times. Ill check the differences between them.

It's just not clear to me how we could improve to 100% reliability - even if the backup system works, it still means that some changes to configuration are lost if you have to revert to the previous file.

Not a problem at least for me, I also think a power cut during a "real" configuration change (and therefore a writing to DB) is very unlikely and can't be further tuned. Like said, I don't think 100% reliability is achievable (and reasonable wrt the effort).

Best regards

cdjackson commented 7 years ago

sometimes I have 5 different backups files created for e.g. Thing.json within 10 seconds from first to lats

It might be interesting to 'diff' them to see what changes?

JsonDB defers writes for something like 1/2 second, so any repeat writes in that time will only result in a single write to disk. Maybe during startup if there's a lot of bindings / things starting up and updating configuration, it might result in a lot of updates in a relatively short period, but I wouldn't think that it should be something that is happening all the time.

philomatic commented 7 years ago

So,

I've finally checked this:

The JsonStorage only uses the old backup files if the current files are not readable / inavlid. However, if no current files are existing, new .json files are created.

In the case of a power failure at writing time, I guess that the linux fs (a RPi 3 in my case) deletes invalid / corrupted files after power return / reboot. At least this is the observed symptom on my PIs, however I didn't look into this further as I don't have a Linux Machine with a ESH IDE or s.th. else.

I've created a pull request:

https://github.com/eclipse/smarthome/pull/3407

Imho, the JsonDB should try to return backup files in case of missing .json databases?

maggu2810 commented 7 years ago

Imho, the JsonDB should try to return backup files in case of missing .json databases?

Hm, if there is a valid database backup but no -- also not an empty -- database, I assume it should be okay, to use the backup.

maggu2810 commented 7 years ago

I close this WRT #3407, #3429, #3439 If there is still something missing or problems exist, please create a new issue.