arangodb-helper / arangodb

ArangoDB Starter - starts ArangoDB clusters & single servers with ease.
Apache License 2.0
75 stars 16 forks source link

Upgrade from 3.6.5 to 3.7.11 fails. #288

Open BaconFries opened 3 years ago

BaconFries commented 3 years ago

I have a few clusters configured the same way and two cluster's successfully upgraded. On one I get the following error when running the arangodb upgrade command.

#2021-05-17T17:35:45Z |FATA| Failed to start database automatic upgrade component=arangodb error="Get http://10.0.18.21:8538/version: dial tcp 10.0.18.21:8538: connect: connection refused"

I'm not sure where it is getting port 8538 from. There isn't a service listening on that port on any host. I have tried manually upgrade without success. What can I do to get this working?

Step used to upgrade.

#Add 'KillMode=process' to systemd in [Service] section on all nodes
vim /etc/systemd/system/arangodb.service
systemctl daemon-reload

#enable maintenance mode on one node
curl http://localhost:8529/_admin/cluster/maintenance -XPUT -d'"on"'

#restart nodes one at a time
service arangodb restart
service arangodb status

#install RPM on all nodes
yum -y install arangodb3-3.7.11-1.0.x86_64

#kill starter on all nodes wait for them to come back up
ps -C arangodb -fww
kill -9 <pid-of-starter>

#upgrade schema on one node
arangodb upgrade --starter.endpoint=http://10.0.18.20:8528

#remove 'KillMode=process' from systemd in [Service] section on all nodes
vim /etc/systemd/system/arangodb.service
systemctl daemon-reload

#restart nodes one at a time
systemctl restart arangodb

#disable maintenance mode on one node
curl http://localhost:8529/_admin/cluster/maintenance -XPUT -d'"off"'

Additional Info

#db1
#running processes
root     10421  0.0  0.0 114736 14856 ?        Sl   May17   1:36 /usr/bin/arangodb --starter.data-dir=/var/lib/arangodb34-cluster
root     10437  0.4  1.5 719312 252744 ?       Sl   May17  16:27  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/agent8531/arangod.conf --database.directory /var/lib/arangodb34-cluster/agent8531/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/agent8531/apps --log.file /var/lib/arangodb34-cluster/agent8531/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://10.0.18.20:8531 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://10.0.18.21:8531 --agency.endpoint tcp://10.0.18.22:8531
root     10576  0.4 10.5 3740700 1682128 ?     Sl   May17  19:27  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/dbserver8530/arangod.conf --database.directory /var/lib/arangodb34-cluster/dbserver8530/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/dbserver8530/apps --log.file /var/lib/arangodb34-cluster/dbserver8530/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.20:8530 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531
root     10657  0.4  0.7 805796 116588 ?       Sl   May17  20:24  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/coordinator8529/arangod.conf --database.directory /var/lib/arangodb34-cluster/coordinator8529/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/coordinator8529/apps --log.file /var/lib/arangodb34-cluster/coordinator8529/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.20:8529 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531

#systemd service script
[Unit]
  Description=Run the ArangoDB Starter
  After=sysinit.target sockets.target timers.target paths.target slices.target network.target syslog.target
[Service]
  LimitNOFILE=1048576
  Type=forking
  User=root
  Group=root
  TimeoutSec=5min
  Restart=always
  RestartSec=20
  ExecStart=/usr/bin/arangodb start \
     --starter.data-dir=/var/lib/arangodb34-cluster/  \
     --starter.join=10.0.18.20

  ExecStop=/usr/bin/arangodb stop
[Install]
  WantedBy=multi-user.target

#log
2021-05-17T17:15:52Z [7566] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:15:52Z [7566] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:15:52Z [7566] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:15:52Z [7566] FATAL [290c2] Database version check failed: downgrade needed
2021-05-17T17:46:57Z [9830] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-05-17T17:46:57Z [9830] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:46:57Z [9830] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:46:57Z [9830] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-05-17T17:46:57Z [9830] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:46:57Z [9830] FATAL [290c2] Database version check failed: downgrade needed

#db2
#running processes
root     20328  0.0  0.0 114736 13820 ?        Sl   May17   0:56 /usr/bin/arangodb --starter.data-dir=/var/lib/arangodb34-cluster --starter.join=10.0.18.20:8528
root     20342  3.7  1.8 890320 289776 ?       Sl   May17 154:26  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/agent8531/arangod.conf --database.directory /var/lib/arangodb34-cluster/agent8531/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/agent8531/apps --log.file /var/lib/arangodb34-cluster/agent8531/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://10.0.18.21:8531 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://10.0.18.20:8531 --agency.endpoint tcp://10.0.18.22:8531
root     20483  0.6 15.2 3907612 2433596 ?     Sl   May17  25:28  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/dbserver8530/arangod.conf --database.directory /var/lib/arangodb34-cluster/dbserver8530/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/dbserver8530/apps --log.file /var/lib/arangodb34-cluster/dbserver8530/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.21:8530 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531
root     20622  0.4  0.9 781948 149896 ?       Sl   May17  18:39  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/coordinator8529/arangod.conf --database.directory /var/lib/arangodb34-cluster/coordinator8529/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/coordinator8529/apps --log.file /var/lib/arangodb34-cluster/coordinator8529/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.21:8529 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531

#systemd service script
[Unit]
  Description=Run the ArangoDB Starter
  After=sysinit.target sockets.target timers.target paths.target slices.target network.target syslog.target
[Service]
  LimitNOFILE=1048576
  Type=forking
  User=root
  Group=root
  TimeoutSec=5min
  Restart=always
  RestartSec=20
  ExecStart=/usr/bin/arangodb start \
     --starter.data-dir=/var/lib/arangodb34-cluster/  \
     --starter.join=10.0.18.20

  ExecStop=/usr/bin/arangodb stop
[Install]
  WantedBy=multi-user.target

#log
2021-05-17T17:15:52Z [19478] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:15:52Z [19478] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:15:53Z [19478] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:15:53Z [19478] FATAL [290c2] Database version check failed: downgrade needed
2021-05-17T17:46:57Z [20122] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-05-17T17:46:57Z [20122] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:46:57Z [20122] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:46:57Z [20122] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-05-17T17:46:57Z [20122] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:46:57Z [20122] FATAL [290c2] Database version check failed: downgrade needed

#db3
#running processes
root     12912  0.0  0.0 114736 13364 ?        Sl   May17   0:59 /usr/bin/arangodb --starter.data-dir=/var/lib/arangodb34-cluster --starter.join=10.0.18.20:8528
root     12927  0.3  1.5 714192 244988 ?       Sl   May17  15:12  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/agent8531/arangod.conf --database.directory /var/lib/arangodb34-cluster/agent8531/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/agent8531/apps --log.file /var/lib/arangodb34-cluster/agent8531/arangod.log --log.force-direct false --javascript.copy-installation true --agency.activate true --agency.my-address tcp://10.0.18.22:8531 --agency.size 3 --agency.supervision true --foxx.queues false --server.statistics false --agency.endpoint tcp://10.0.18.20:8531 --agency.endpoint tcp://10.0.18.21:8531
root     13066  0.5 11.7 3285020 1868972 ?     Sl   May17  21:54  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/dbserver8530/arangod.conf --database.directory /var/lib/arangodb34-cluster/dbserver8530/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/dbserver8530/apps --log.file /var/lib/arangodb34-cluster/dbserver8530/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.22:8530 --cluster.my-role PRIMARY --foxx.queues false --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531
root     13206  0.5  1.0 914732 165792 ?       Sl   May17  21:11  \_ /usr/sbin/arangod -c /var/lib/arangodb34-cluster/coordinator8529/arangod.conf --database.directory /var/lib/arangodb34-cluster/coordinator8529/data --javascript.startup-directory /usr/share/arangodb3/js --javascript.app-path /var/lib/arangodb34-cluster/coordinator8529/apps --log.file /var/lib/arangodb34-cluster/coordinator8529/arangod.log --log.force-direct false --javascript.copy-installation true --cluster.my-address tcp://10.0.18.22:8529 --cluster.my-role COORDINATOR --foxx.queues true --server.statistics true --cluster.agency-endpoint tcp://10.0.18.20:8531 --cluster.agency-endpoint tcp://10.0.18.21:8531 --cluster.agency-endpoint tcp://10.0.18.22:8531

#systemd service script
[Unit]
  Description=Run the ArangoDB Starter
  After=sysinit.target sockets.target timers.target paths.target slices.target network.target syslog.target
[Service]
  LimitNOFILE=1048576
  Type=forking
  User=root
  Group=root
  TimeoutSec=5min
  Restart=always
  RestartSec=20
  ExecStart=/usr/bin/arangodb start \
     --starter.data-dir=/var/lib/arangodb34-cluster/  \
     --starter.join=10.0.18.20

  ExecStop=/usr/bin/arangodb stop
[Install]
  WantedBy=multi-user.target

#log
2021-05-17T17:15:52Z [8302] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:15:52Z [8302] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:15:53Z [8302] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:15:53Z [8302] FATAL [290c2] Database version check failed: downgrade needed
2021-05-17T17:46:57Z [8973] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-05-17T17:46:57Z [8973] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T17:46:57Z [8973] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T17:46:57Z [8973] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-05-17T17:46:57Z [8973] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T17:46:57Z [8973] FATAL [290c2] Database version check failed: downgrade needed
2021-05-17T19:51:49Z [11303] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-05-17T19:51:49Z [11303] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T19:51:49Z [11303] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T19:51:49Z [11303] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-05-17T20:06:34Z [12739] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-05-17T20:06:34Z [12739] INFO [144fe] using storage engine 'rocksdb'
2021-05-17T20:06:34Z [12739] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 8192, soft limit is 8192
2021-05-17T20:06:34Z [12739] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-05-17T20:06:34Z [12739] WARNING [ef6ca] Database version check failed for '_system': downgrade needed
2021-05-17T20:06:34Z [12739] FATAL [290c2] Database version check failed: downgrade needed
ajanikow commented 3 years ago

Hello!

Can you share logs of dbserver from upgrade operation? It is hard to see why it says db needs downgrade (looks like binary changed to old one after upgrade)

BaconFries commented 3 years ago

The logs in additional info were all the logs generated.