TritonDataCenter / pg_prefaulter

Faults pages into PostgreSQL shared_buffers or filesystem caches in advance of WAL apply
Apache License 2.0
56 stars 13 forks source link

prefaulter must start up even if database is not available #18

Closed jclulow closed 7 years ago

jclulow commented 7 years ago

It would appear that the prefaulter does not start up correctly if the database is not available initially. When run as an SMF service, it fails fast this way a few times and then the service goes into maintenance:

...
unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
[ Sep 13 03:18:10 Stopping because all processes in service exited. ]
[ Sep 13 03:18:10 Executing stop method (:kill). ]
[ Sep 13 03:18:10 Executing start method ("/opt/pg_prefaulter/pg_prefaulter run --config=/opt/pg_prefaulter/pg_prefaulter.toml &"). ]
[ Sep 13 03:18:10 Method "start" exited with status 0. ]
{"time":"2017-09-13T03:18:10.84142588Z","level":"debug","config-file":"/opt/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-09-13T03:18:10.841509168Z","level":"debug","message":"args: []"}
{"time":"2017-09-13T03:18:10.841778917Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-09-13T03:18:10.841794234Z","level":"debug","postgresql.pgdata":"/manatee/pg/data","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/current/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-09-13T03:18:10.842199269Z","level":"info","message":"Starting pg_prefaulter"}
{"time":"2017-09-13T03:18:10.842205727Z","level":"info","message":"Stopped pg_prefaulter"}
Error: unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
Usage:
  pg_prefaulter run [flags]

Flags:
  -h, --help                   help for run
  -m, --mode string            Mode of operation of the database: "auto", "primary", "follower" (default "auto")
  -N, --num-io-threads uint    Number of IO threads to spawn for IOs
  -i, --poll-interval string   Interval to poll the database for state change (default "1s")
  -n, --wal-readahead uint     Number of WAL entries to perform read-ahead into (default 4)
  -X, --xlog-mode string       pg_xlogdump(1) variant: "xlog" or "pg" (default "pg")
  -x, --xlogdump-bin string    Path to pg_xlogdump(1) (default "/usr/local/bin/pg_xlogdump")

Global Flags:
  -a, --circonus-api-key string                         Circonus API token
      --circonus-api-url string                         Circonus API URL (default "https://api.circonus.com/v2")
      --circonus-broker-id string                       Circonus Broker ID
      --circonus-broker-max-response-time string        Circonus Broker Max Response Time (default "500ms")
      --circonus-broker-select-tag string               Circonus Broker Select Tag
      --circonus-check-display-name string              Circonus Check Display Name (default "pg_prefaulter")
      --circonus-check-force-metric-activation string   Circonus Check Force Metric Activation (default "false")
      --circonus-check-id string                        Circonus Check ID
      --circonus-check-instance-id string               Circonus Check Instance ID (default "4042552b-5fe0-45c4-9df4-f21eca88464d:pg_prefaulter")
      --circonus-check-max-url-age string               Circonus Check Max URL Age (default "5m")
      --circonus-check-search-tag string                Circonus Check Search Tag (default "app:pg_prefaulter,host:4042552b-5fe0-45c4-9df4-f21eca88464d")
      --circonus-check-secret string                    Circonus Check Secret
      --circonus-check-tags string                      Circonus Check Tags (default "app:pg_prefaulter")
      --circonus-check-target-host string               Circonus Check Target Host (default "4042552b-5fe0-45c4-9df4-f21eca88464d")
      --circonus-debug                                  Enable Circonus Debug
      --circonus-enable-metrics                         Enable Circonus metrics (default true)
      --circonus-submission-url string                  Circonus Check Submission URL
      --config string                                   config file (default "pg_prefaulter.toml")
  -d, --database string                                 postgres (default "postgres")
      --enable-agent                                    Enable the gops(1) agent interface (default true)
  -H, --hostname string                                 Hostname to connect to PostgreSQL (default "/tmp")
  -l, --log-level string                                Log level (default "INFO")
  -D, --pgdata string                                   Path to PGDATA (default "pgdata")
  -p, --port uint                                       Hostname to connect to PostgreSQL (default 5432)
  -U, --username string                                 Username to connect to PostgreSQL (default "postgres")

unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
[ Sep 13 03:18:10 Stopping because all processes in service exited. ]
[ Sep 13 03:18:10 Executing stop method (:kill). ]
[ Sep 13 03:18:10 Restarting too quickly, changing state to maintenance. ]

Once the database is running, I can clear the maintenance status and the service appears to start correctly:

[ Sep 13 03:20:11 Leaving maintenance because clear requested. ]
[ Sep 13 03:20:11 Enabled. ]
[ Sep 13 03:20:11 Executing start method ("/opt/pg_prefaulter/pg_prefaulter run --config=/opt/pg_prefaulter/pg_prefaulter.toml &"). ]
[ Sep 13 03:20:11 Method "start" exited with status 0. ]
{"time":"2017-09-13T03:20:11.206137022Z","level":"debug","config-file":"/opt/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-09-13T03:20:11.20630263Z","level":"debug","message":"args: []"}
{"time":"2017-09-13T03:20:11.20648666Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-09-13T03:20:11.20670297Z","level":"debug","postgresql.pgdata":"/manatee/pg/data","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/current/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-09-13T03:20:11.207273131Z","level":"info","message":"Starting pg_prefaulter"}
{"time":"2017-09-13T03:20:11.226491329Z","level":"debug","backend-pid":63146,"version":"PostgreSQL 9.6.4 on i386-pc-solaris2.11, compiled by gcc (GCC) 4.9.3, 64-bit","message":"established DB connection"}
{"time":"2017-09-13T03:20:11.226914389Z","level":"debug","rlimit-nofile":65536,"filehandle-cache-size":65486,"filehandle-cache-ttl":3600000,"message":"filehandle cache initialized"}
{"time":"2017-09-13T03:20:11.243983658Z","level":"info","io-worker-threads":1000,"message":"started IO worker threads"}
{"time":"2017-09-13T03:20:11.244136082Z","level":"debug","message":"Starting wait"}
{"time":"2017-09-13T03:20:11.244148015Z","level":"debug","wal-worker-thread-id":2,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244151999Z","level":"debug","wal-worker-thread-id":1,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244147788Z","level":"debug","wal-worker-thread-id":3,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244208579Z","level":"debug","wal-worker-thread-id":0,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244208705Z","level":"debug","message":"Starting pg_prefaulter agent"}
{"time":"2017-09-13T03:21:11.227143486Z","level":"debug","hit":23176,"miss":646,"lookup":23822,"hit-rate":0.9728822097221056,"message":"filehandle-stats"}
{"time":"2017-09-13T03:21:11.24432601Z","level":"debug","hit":1,"miss":2,"lookup":3,"hit-rate":0.3333333333333333,"message":"walcache-stats"}
{"time":"2017-09-13T03:21:11.244413444Z","level":"debug","hit":8178,"miss":41757,"lookup":49935,"hit-rate":0.16377290477620907,"message":"iocache-stats"}
{"time":"2017-09-13T03:22:11.228855455Z","level":"debug","hit":113623,"miss":734,"lookup":114357,"hit-rate":0.9935815035371687,"message":"filehandle-stats"}
{"time":"2017-09-13T03:22:11.244911682Z","level":"debug","hit":3,"miss":4,"lookup":7,"hit-rate":0.42857142857142855,"message":"walcache-stats"}
{"time":"2017-09-13T03:22:11.244947642Z","level":"debug","hit":57646,"miss":191404,"lookup":249050,"hit-rate":0.23146356153382855,"message":"iocache-stats"}
{"time":"2017-09-13T03:23:11.229500586Z","level":"debug","hit":199725,"miss":754,"lookup":200479,"hit-rate":0.9962390075768535,"message":"filehandle-stats"}
{"time":"2017-09-13T03:23:11.245253655Z","level":"debug","hit":6,"miss":8,"lookup":14,"hit-rate":0.42857142857142855,"message":"walcache-stats"}
{"time":"2017-09-13T03:23:11.246306122Z","level":"debug","hit":128811,"miss":331991,"lookup":460802,"hit-rate":0.279536547150403,"message":"iocache-stats"}
{"time":"2017-09-13T03:24:11.230050485Z","level":"debug","hit":240793,"miss":759,"lookup":241552,"hit-rate":0.9968578194343247,"message":"filehandle-stats"}
{"time":"2017-09-13T03:24:11.245531156Z","level":"debug","hit":8,"miss":10,"lookup":18,"hit-rate":0.4444444444444444,"message":"walcache-stats"}
{"time":"2017-09-13T03:24:11.246573786Z","level":"debug","hit":165312,"miss":400260,"lookup":565572,"hit-rate":0.29229169760879253,"message":"iocache-stats"}
{"time":"2017-09-13T03:25:11.230626347Z","level":"debug","hit":282746,"miss":766,"lookup":283512,"hit-rate":0.9972981743277181,"message":"filehandle-stats"}
{"time":"2017-09-13T03:25:11.24583358Z","level":"debug","hit":10,"miss":12,"lookup":22,"hit-rate":0.45454545454545453,"message":"walcache-stats"}
{"time":"2017-09-13T03:25:11.246880287Z","level":"debug","hit":203032,"miss":468442,"lookup":671474,"hit-rate":0.3023676270414044,"message":"iocache-stats"}
{"time":"2017-09-13T03:26:11.231247847Z","level":"debug","hit":325980,"miss":773,"lookup":326753,"hit-rate":0.9976342986904482,"message":"filehandle-stats"}
{"time":"2017-09-13T03:26:11.2461834Z","level":"debug","hit":12,"miss":14,"lookup":26,"hit-rate":0.46153846153846156,"message":"walcache-stats"}
{"time":"2017-09-13T03:26:11.247218878Z","level":"debug","hit":241664,"miss":538432,"lookup":780096,"hit-rate":0.3097875133316925,"message":"iocache-stats"}
{"time":"2017-09-13T03:27:11.231782623Z","level":"debug","hit":367433,"miss":778,"lookup":368211,"hit-rate":0.9978870810486379,"message":"filehandle-stats"}
{"time":"2017-09-13T03:27:11.246419942Z","level":"debug","hit":14,"miss":16,"lookup":30,"hit-rate":0.4666666666666667,"message":"walcache-stats"}
{"time":"2017-09-13T03:27:11.247501626Z","level":"debug","hit":278505,"miss":607739,"lookup":886244,"hit-rate":0.3142531853530179,"message":"iocache-stats"}
{"time":"2017-09-13T03:28:11.232316668Z","level":"debug","hit":430939,"miss":780,"lookup":431719,"hit-rate":0.9981932692329964,"message":"filehandle-stats"}
{"time":"2017-09-13T03:28:11.246682366Z","level":"debug","hit":18,"miss":20,"lookup":38,"hit-rate":0.47368421052631576,"message":"walcache-stats"}
{"time":"2017-09-13T03:28:11.247703691Z","level":"debug","hit":356427,"miss":744053,"lookup":1100480,"hit-rate":0.32388321459726666,"message":"iocache-stats"}
...