blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
872 stars 170 forks source link

Multiple docker instances lock the sparql endpoints #171

Open teledyn opened 3 years ago

teledyn commented 3 years ago

My apologies if this isn't the right forum for advice, but I'm seeking some guidance in debugging a deployment; the issue involves blazegraph with docker under aws batch instances, so it's a bit out of scope - I'm mostly hoping for some guidance on how to diagnose where it may be failing

I have a docker container for a python process which uses blazegraph; the (Ubuntu 20.04) container starts my python process, and then python launches blazegraph.jar using subprocess.Popen, terminating blazegraph in a 'finally' clause.

By itself, this works very well, and when deployed to an AWS batch instance, alone and one container at a time, it works very well. But when two or more such containers are deployed on the same EC2 instance, an initial ASK query confirms the sparql endpoint is running, but during the processing all blazegraph instances stop answering, with nothing printed to stdout/stderr.

My understanding of docker is all references to localhost ports are isolated unless the container enables them, and none are enabled; just in case, I tried assigning random numbers jetty.port, but the behaviour is unchanged. If I deploy these containers one at a time, the process runs to completion without incident, but when aws puts more than one on a single EC2 instance, they all lock up.

Is there anything about the default settings of blazegraph that should prevent two docker containers from co-existing? Is there something about my Popen strategy that might be inherently unstable? Is it possible to get more verbose output to stdout/stderr?

teledyn commented 3 years ago

AWS responded to explain that their Batch Network always runs on host mode, and that it is not currently possible to change this, and so two containers providing service to the same port will collide. Since mine locked up despite Jetty being set to random ports on each container, does this mean there is some other (undocumented) port service that is causing the lock-up? Or could there be some other crosstalk issue due to host mode?