XSCE / xsce

xsce code base
Other
32 stars 32 forks source link

on power failure mongodb is corrupt #879

Open tim-moody opened 7 years ago

tim-moody commented 7 years ago

on rpi3 after losing power, when running both via console and with runansible

TASK [mongodb : enable services] *** failed: [127.0.0.1] (item={u'name': u'mongodb'}) => {"failed": true, "item": {"name": "mongodb"}, "msg": "Unable to restart service mongodb: Job for mongodb.service failed. See 'systemctl status mongodb.service' and 'journalctl -xn' for details.\n"}

tim-moody commented 7 years ago

after mongod --dbpath /library/dbdata/mongodb --repair this problem remained, though perhaps because I ran it as root.

rm -rf /library/dbdata/mongodb allowed it to run, but of course all data would have been lost

holta commented 7 years ago

Update from Apr 13 call @ http://tinyurl.com/iiabminutes and @floydianslips facing the exact same issue in Raspbian Lite today, just like I faced it within Raspbian Pixel in earlier days.

@floydianslips had not run http://box/sugarizer and likely did 1 "hard reset" by accident @holta observed the system had frozen solid twice during the prior week (but it's behaving well for days on end now...)

"[What] is destroying MongoDB regularly? e.g. on Holt’s RPi3-Raspbian-128GB-install-test (runansible fails, after selecting “Check to Enable WordPress”). Despite being powered off properly (machine froze 2 times, in the 10 days since built on a brand new SD). Clarif: nothing to do with WordPress #902, but corrupt MongoDB prevented ansible from completing i. https://forums.meteor.com/t/why-mongodb-is-unreliable/5370 ii. Workaround so far: “rm -rf /library/dbdata/mongodb” presumably blows away Sugarizer history :( [but at least ./runansible works after this!] iii. @georgejhunt : turn on Journaling in future to aide repair of MongoDB’s ?"

holta commented 7 years ago

Unscientific Speculation (below) if @llaske has a moment to examine bug reports above:

"Something doesn't quite add up, as MongoDB wasn't jamming up Internet-in-a-Box/XSCE's ansible runs earlier this year in 2017, despite inevitable/intermittent power failures. And yet now it's happening fequently. Is there any possibility Sugarizer 0.8's MongoDB is more fragile than Sugarizer 0.7's, or that a new version of MongoDB is much more frail?"

llaske commented 7 years ago

There was no update on the MongoDB part between Sugarizer 0.7 and Sugarizer 0.8. So I don't think that the update could be the cause of the issue. I'm not an expert on MongoDB but may be stopping it brutally could cause it to fail on next restart. I guess that a work around could be to launch a - preventive - repair command at each server start ?

holta commented 7 years ago

Very sadly a MongoDB repair does not work, as @tim-moody and I have both tried that.

Thx @llaske for checking in~ we'll have to write this up as a Known Issue regrettably, plz plz help us monitor the situation in 2017, in case progress appears later!

llaske commented 7 years ago

:-( Another work around is to don't start MongoDB and to don't start sugarizer.js nodejs script. Without its backend, Sugarizer will work in a "limited/degraded mode". In this mode, activities could be launch and will run as usual but neither presence (multiple users playing to activities) and collaboration (shared journal) will be available. It's probably acceptable for deployments where using Sugarizer is not the main objective. I can't guaranty however than this limited mode will continue to work on future version.

holta commented 7 years ago

Excellent advice @llaske !

@tim-moody has done most all the research here, but I'll ask @georgejhunt & @floydianslips to look into this too (as offline world / developing world stability is critical, when disconnected for ~2 years at a time...)

tim-moody commented 7 years ago

Was putting mongodb in journaling mode, as George suggested, considered?

On Sat, Apr 15, 2017 at 5:07 AM, A Holt notifications@github.com wrote:

Excellent advice @llaske https://github.com/llaske !

@tim-moody https://github.com/tim-moody has done most all the research here, but I'll ask @georgejhunt https://github.com/georgejhunt & @floydianslips https://github.com/floydianslips to look into this too (as offline world / developing world stability is critical, when disconnected for ~2 years at a time...).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/XSCE/xsce/issues/879#issuecomment-294281480, or mute the thread https://github.com/notifications/unsubscribe-auth/AFami_dYZc58g3M2uHPL8D0vaus6PtIXks5rwIi9gaJpZM4L2yiN .