biokoda / actordb

ActorDB distributed SQL database
Mozilla Public License 2.0
1.9k stars 71 forks source link

[RESOLVED] First run of container always exits with status 135 when local data volume resides within Dropbox folder hierarchy #31

Closed ianmjones closed 8 years ago

ianmjones commented 8 years ago

Hi,

I've built a Docker image that seems to work pretty well, except that it always exits with status 135 on its very first run.

I've raised an issue for my image's repo which has a few more details.

https://github.com/bytepixie/actordb-for-docker/issues/1

Do you have any idea what exit status 135 might be related to?

It's strange that as long as the data files are available in /var/lib/actordb (created on first run) then the crash does not happen.

SergejJurecko commented 8 years ago

Is there anything in the log folder?

ianmjones commented 8 years ago

@SergejJurecko, it creates the 4 log files (console.log, crash.log, error.log, sasl-error.log) on each node, but they are all empty. On the successful second run console.log will have some entries in one of the nodes...

2016-09-27 07:05:35.375 [info] <0.41.0> Undefined Undefined alarm_handler: {set,{{disk_almost_full,"/etc/actordb"},[]}}
2016-09-27 07:05:35.376 [info] <0.41.0> Undefined Undefined alarm_handler: {set,{system_memory_high_watermark,[]}}
2016-09-27 07:05:35.490 [info] <0.7.0> Undefined Undefined Application lager started on node 'node3@actordb-server-3'
2016-09-27 07:05:35.490 [info] <0.7.0> Undefined Undefined Application crypto started on node 'node3@actordb-server-3'
2016-09-27 07:05:35.578 [info] <0.7.0> Undefined Undefined Application asn1 started on node 'node3@actordb-server-3'
2016-09-27 07:05:35.584 [info] <0.7.0> Undefined Undefined Application distreg started on node 'node3@actordb-server-3'
2016-09-27 07:05:35.842 [info] <0.7.0> Undefined Undefined Application bkdcore started on node 'node3@actordb-server-3'
2016-09-27 07:05:35.879 [info] <0.132.0> actordb_local 370 Starting local
2016-09-27 07:05:36.070 [info] <0.141.0> actordb_catchup 157 Catchup: []
2016-09-27 07:05:36.110 [info] <0.144.0> Undefined Undefined thrift service listening on port 33306
2016-09-27 07:05:36.110 [info] <0.7.0> Undefined Undefined Application actordb_core started on node 'node3@actordb-server-3'
2016-09-27 07:05:36.110 [info] <0.7.0> Undefined Undefined Application adbt started on node 'node3@actordb-server-3'

Are there any other logs I should check after the crash, if I can capture them?

SergejJurecko commented 8 years ago

In app.config you can find this in the lager configuration for file console.log: {level, info} Change it to {level,debug}

But considering there are no logs, I assume it stops before it gets to any point that would log so it probably wont help.

Maybe it can not get networking or some file path that should exist does not.

ianmjones commented 8 years ago

Thanks for that @SergejJurecko, console.log now has some info, hopefully you'll be able to see something weird in it!

I've attached a zipped up folder which contains console.log files for 3 nodes all brought up at the same time, each with {level,debug} set.

The logs for nodes 1 and 2 are nearly identical, node 3's looks a little different.

debug_console_logs.zip

I've pushed a new branch to my repo for debugging this issue.

https://github.com/bytepixie/actordb-for-docker/tree/exit-status-135-debug-%231

Once checked out (assuming you have Docker for Mac/Windows/Linux installed) anyone can debug the issue.

up.sh will bring up the 3 node cluster. status.sh will show the container's statuses. logs.sh will show the Docker Compose runtime logs (std output, not the console.log etc that are in node*/logs/). down.sh will bring the containers back down.

The first time you use up.sh you'll notice that status.sh shows that all 3 containers have exited with status 135. Sometimes they will be "Up" for a second or two before showing "Exit 135".

The log files can be found in each of the node*/logs/ folders.

If you then use down.sh to remove the containers, and then up.sh again, all the containers will come up and be ready to use.

$ docker-compose exec actordb-server-1 actordb_console -f /etc/actordb/init.example.sql
Config updated.
Config updated.
Schema updated.

If you want to re-try the experiment, then after bringing the containers down again with down.sh, you can use cleanup.sh to remove the local data and logs. That way the next up.sh will show the problem again.

I'm happy to keep on debugging this, including updating the Docker image if need be.

SergejJurecko commented 8 years ago

I can not reproduce the issue running on macOS (with all software up-to-date). Running ./up.sh always keeps nodes up...

ianmjones commented 8 years ago

Ooh, that's excellent news!

Maybe it's a bug in Docker for Mac on Mac OS X 10.11.6 (I haven't upgraded to 10.12 yet). I'll spin up some other hosts and install Docker to give it a spin, and ask some friends to see if they can reproduce.

Will report back.

ianmjones commented 8 years ago

Thanks for taking the time to try and reproduce the problem @SergejJurecko, very much appreciated.

Experiments on different versions of macOS and Linux didn't suffer from the "Exit 135" problem, so it looks like it might be an issue either with Docker on Mac OS X 10.11.6, or just my MacBook Pro.

Sorry for the noise!

ianmjones commented 8 years ago

Finally found the cause!

It seems to be that if you run the container with the /var/lib/actordb volume mounted somewhere within a Dropbox folder, it'll "Exit 135" when creating the lmdb and lmdb-lock files.

Not something my container or ActorDB can possibly ever address.

Just don't mount the container's /var/lib/actordb volume into a Dropbox folder, even during development!

SergejJurecko commented 8 years ago

Thank you for the effort and creating the docker image.

ianmjones commented 8 years ago

Thank you for the effort and creating the docker image.

My pleasure, thanks for creating and open sourcing such a great project.