dbbaskette / gpdb-docker

Greenplum Database Docker image
21 stars 29 forks source link

Question about 'privileged' mode #5

Open kevinmtrowbridge opened 7 years ago

kevinmtrowbridge commented 7 years ago

Hi @dbbaskette -- remember last year, I had also tried "dockerizing" GPDB (https://github.com/kevinmtrowbridge/greenplumdb_singlenode_docker) -- but then, you seemed to do it better, you work for Pivotal, we switched our automated testing to use images generated with your Dockerfile! Thank you. I work on Alpine Chorus -- not sure if you are familiar with this product? We inherited Chorus from Pivotal.

We have a lot of automated testing that uses GPDB, and having it available in a Docker container has made our lives much easier.


Now -- the purpose of this issue: it's been my experience that GPDB needs to be run in Docker's "privileged" mode. (Travis CI is the only one of the "CI as a service" vendors that allows you to run containers in privileged mode and that's how I have been running our tests for the past year.)

As of late we've hired a "packaging engineer," and I am working with him to setup a new CI pipeline based on GitLab, Docker, and Rancher. Very exciting stuff!

Prompted by his questioning, I though it would be good to re-examine the need to run GPDB in privileged mode. (It's not that big of a deal, but I find each time I want to run our tests in a new environment, I have to figure out how to enable this, and in some places it's impossible. So essentially this acts as a friction against the promise of ultimate "portability" which is what Docker is all about.)

I'm not very familiar with GPDB, and not even really that familiar with Docker -- it is the necessity to modify the core linux settings (this sort of stuff: https://github.com/kevinmtrowbridge/gpdb-docker/blob/master/configs/sysctl.conf.add) that necessitates the privileged mode?

Is my experience the same as yours? (First question: I'm not crazy, right?) Second question: do you have any insight into whether or not the requirement for privileged mode can be removed? What needs to be done to make this happen?


Details:

Here's an example of running a GPDB image built from your repo, NOT in privileged mode:

✗ docker run --security-opt no-new-privileges -it kevinmtrowbridge/gpdb-docker:squashed
Starting sshd:                                             [  OK  ]
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Starting gpstart with args: -a
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Gathering information and validating the environment...
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-postmaster.pid file exists on Master, checking if recovery startup required
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Commencing recovery startup checks
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Have lock file /tmp/.s.PGSQL.5432 but no process running on port 5432
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-No Master instance process, entering recovery startup mode
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Clearing Master instance lock files
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Clearing Master instance pid file
20170210:20:37:46:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Starting Master instance in admin mode
20170210:20:37:52:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170210:20:37:52:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Obtaining Segment details from master...
20170210:20:37:52:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Setting new master era
20170210:20:37:52:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Commencing forced instance shutdown
20170210:20:37:53:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Starting Master instance in admin mode
20170210:20:37:54:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170210:20:37:54:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Obtaining Segment details from master...
20170210:20:37:54:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Setting new master era
20170210:20:37:54:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Master Started...
20170210:20:37:54:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Shutting down master
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Ping to host: '72ba20be3774' FAILED
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Ping to host: '72ba20be3774' FAILED
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Ping to host: '72ba20be3774' FAILED
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Skipping startup of segdb on 72ba20be3774 directory /gpdata/segments/gpseg0 Ping Failed <<<<<<
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Skipping startup of segdb on 72ba20be3774 directory /gpdata/segments/gpseg1 Ping Failed <<<<<<
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[ERROR]:-No segment started for content: 0.
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-dumping success segments: []
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-DBID:2  FAILED  host:'72ba20be3774' datadir:'/gpdata/segments/gpseg0' with reason:'Failed to Ping on host: 72ba20be3774'
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-DBID:3  FAILED  host:'72ba20be3774' datadir:'/gpdata/segments/gpseg1' with reason:'Failed to Ping on host: 72ba20be3774'
20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------

20170210:20:37:55:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-   Successful segment starts                                            = 0
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Failed segment starts                                                = 2   <<<<<<<<
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Successfully started 0 of 2 segment instances <<<<<<<<
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Segment instance startup failures reported
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Failed start 2 of 2 segment instances <<<<<<<<
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[WARNING]:-Review /home/gpadmin/gpAdminLogs/gpstart_20170210.log
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:37:56:000050 gpstart:d8a9b560f3c6:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
. 
20170210:20:37:57:000050 gpstart:d8a9b560f3c6:gpadmin-[ERROR]:-gpstart error: Do not have enough valid segments to start the array.
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

... and here, in privileged mode:

➜  chorus git:(gitlab-tests) ✗ docker run --privileged -it kevinmtrowbridge/gpdb-docker:squashed
Starting sshd:                                             [  OK  ]
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Starting gpstart with args: -a
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Gathering information and validating the environment...
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.7.1 build 1'
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[WARNING]:-postmaster.pid file exists on Master, checking if recovery startup required
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Commencing recovery startup checks
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Have lock file /tmp/.s.PGSQL.5432 but no process running on port 5432
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-No Master instance process, entering recovery startup mode
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Clearing Master instance lock files
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Clearing Master instance pid file
20170210:20:38:22:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Starting Master instance in admin mode
20170210:20:38:27:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170210:20:38:27:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Obtaining Segment details from master...
20170210:20:38:27:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Setting new master era
20170210:20:38:27:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Commencing forced instance shutdown
20170210:20:38:28:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Starting Master instance in admin mode
20170210:20:38:29:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20170210:20:38:29:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Obtaining Segment details from master...
20170210:20:38:29:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Setting new master era
20170210:20:38:29:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Master Started...
20170210:20:38:29:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Shutting down master
20170210:20:38:31:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
..... 
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Process results...
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-   Successful segment starts                                            = 2
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances 
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-----------------------------------------------------
20170210:20:38:36:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Starting Master instance 72ba20be3774 directory /gpdata/master/gpseg-1 
20170210:20:38:37:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Command pg_ctl reports Master 72ba20be3774 instance active
20170210:20:38:37:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-No standby master configured.  skipping...
20170210:20:38:37:000050 gpstart:e907365f2ca9:gpadmin-[INFO]:-Database successfully started
ALTER ROLE

Thanks for your work! It's nice to have at least one other person in the world who shares your problems. :)

dbbaskette commented 7 years ago

Glad someone is using it. I haven't done much with this in a while...some other folks at Pivotal picked up the ball and ran with it. On Dockerhub you can see other gpdb images now that they are building. Anyway, played around with this and the main issue that seems to be the culprit is the fact that on startup the master will ping the segments to see if they are there. In docker, since ping needs root access to open a port.... that fails. I tested this by just adding a /bin/sh to the startup so that after failure you get a shell and su - gpadmin, then ping localhost. That will fail. On a whim, i took the results of a ping localhost as root and stuck that in a text file and then replaces ping with a script that catted that file and allowed gpadmin access to those. Now when i run ping...I get a fake response. I then ran gpstart and presto....it works. The sysctl settings will might be an issue as well, but if the host level settings are "good enough" it will still run even without changing them.