ReinerNippes / nextcloud_on_docker

Run Nextcloud in Docker Container on various Linux Hosts
MIT License
203 stars 48 forks source link

Postgres Upgrade Error #66

Closed kylesf closed 3 years ago

kylesf commented 3 years ago

Hi Reiner,

In the past day something auto-updated and it stopped my nextcloud from being accessible. Looking into it it seemed to have to with and certdumper. Regardless, it had been a while since I synced to the current version of this repository. In doing so I get an error about "nextcloud, The data directory was initialized by PostgreSQL version 10, which is not compatible with this version 11.8.". I see it was updated in the group vars. Upgrading from v10 to v11 seems convoluted to say the least.

Did I miss something?

Trying to roll back the docker image to v10.13 causes more problems. I have a year of restic backups if that's of any help.

How to you recommend I best proceed?

ReinerNippes commented 3 years ago

Hi,

sorry for the trouble. I should have mentioned that an update of that playbook would not result in an update of the installation.

Nevertheless I would recommend to restore your system to Postgres10 by editing this line:

https://github.com/ReinerNippes/nextcloud_on_docker/blob/61db7393477478ef59d16f7941c01db59277730a/group_vars/all.yml#L18

or have a look at https://github.com/tianon/docker-postgres-upgrade/ to update your database files.

Please note also that now the playbook also sets up traefik v2.2. The cert file format isn't compatible with v1.7. So you have to delete /opt/nextcloud/traefik/acme.json and rerun the playbook or touch/chmod/chown the file.

kylesf commented 3 years ago

Upon moving the database back to Postgres10, I get the following docker log errors:

    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:13:48.069 UTC [7181] FATAL:  password authentication failed for user "oc_admin"
2020-07-18 17:13:48.069 UTC [7181] DETAIL:  Role "oc_admin" does not exist.
    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:13:57.484 UTC [7192] FATAL:  role "postgres" does not exist
2020-07-18 17:14:07.884 UTC [7199] FATAL:  role "postgres" does not exist
2020-07-18 17:14:17.654 UTC [7200] FATAL:  password authentication failed for user "oc_admin"
2020-07-18 17:14:17.654 UTC [7200] DETAIL:  Role "oc_admin" does not exist.
    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:14:17.657 UTC [7201] FATAL:  password authentication failed for user "oc_admin"
2020-07-18 17:14:17.657 UTC [7201] DETAIL:  Role "oc_admin" does not exist.
    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:14:18.353 UTC [7208] FATAL:  role "postgres" does not exist
2020-07-18 17:14:19.608 UTC [7209] FATAL:  password authentication failed for user "oc_admin"
2020-07-18 17:14:19.608 UTC [7209] DETAIL:  Role "oc_admin" does not exist.
    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:14:19.611 UTC [7210] FATAL:  password authentication failed for user "oc_admin"
2020-07-18 17:14:19.611 UTC [7210] DETAIL:  Role "oc_admin" does not exist.
    Connection matched pg_hba.conf line 95: "host all all all md5"
2020-07-18 17:14:28.762 UTC [7217] FATAL:  role "postgres" does not exist

Trying to tread lightly as I do not want to make the situation worse. I'm not sure what would change the database?

All the networking issues are resolved. I can make successful connections to portainer.

Do you think the best route is to upgrade to postgres 11 or there might be a database issue and the best course of action would to restic restore something?

ReinerNippes commented 3 years ago

I just run a cycle pg10 -> pg11 -> pg10 on a test machine and starting pg11 didn't break the database in my case. of course the database was empty.i only setup nc but didn't upload anything.

In your case the role oc_admin is missing. so i guess you have a broken database.

do you have a test machine to try to restore from restic backup?

kylesf commented 3 years ago

Since restic backup is on external mounted drive. Should I just attempt to restore from restic to a cleaned install? So long as long as make sure I launch with 10-alpine and ensure that acme.json is taken care off there should be nothing else prohibiting a new initialization?

ReinerNippes commented 3 years ago

i think it's only necessary to restore the database. so start a postgres 10 container and import the database dump.

kylesf commented 3 years ago

Getting back to this. I thought it would be best to use the playbook spun up database. So I went to restore the database but got that the nextcloud db is missing.

kyle@alpha:~$ docker exec nextcloud-db psql -U nextcloud -l
                                  List of databases
   Name    |   Owner   | Encoding |  Collate   |   Ctype    |    Access privileges    
-----------+-----------+----------+------------+------------+-------------------------
 default   | nextcloud | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres  | nextcloud | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | nextcloud | UTF8     | en_US.utf8 | en_US.utf8 | =c/nextcloud           +
           |           |          |            |            | nextcloud=CTc/nextcloud
 template1 | nextcloud | UTF8     | en_US.utf8 | en_US.utf8 | =c/nextcloud           +
           |           |          |            |            | nextcloud=CTc/nextcloud
(4 rows)

kyle@alpha:~$ docker exec nextcloud-db psql -U nextcloud --set ON_ERROR_STOP=on -f /var/lib/postgresql/data/db_dump_pgsql_nextcloud.sql
psql: FATAL:  database "nextcloud" does not exist

Do you recommend I create the DB and import? Should I look for DB creation errors with the current playbook script. (Although when last running it I did not get any errors from the final report.)

ReinerNippes commented 3 years ago

The playbook would create an empty nextcloud db and initialize it with the nextcloud occ coomand so you could log into an empty nextcloud. To restore you have to restore the data files, the config.php and import your database backup dump.

kylesf commented 3 years ago

I went on route of trying from an older issue:

A "bare metal" restore would be:

    install the OS (No change)
    run the playbook (changed /opt/nextcloud -> /opt/nextcloud1)
    stop nginx, php-fpm, redis and database server container (Done but no php-fpm container )
    delete /opt/nextcloud/* (Done)
    restore all directories and files from restic repo to /opt/nextcloud (changed /opt/nextcloud1 -> /opt/nextcloud)
    start the database container (Done)
    drop the nextcloud database created during playbook run 
(Done with docker exec nextcloud-db psql -U default -c 'DROP DATABASE nextcloud')

Then

restore the database dump in /opt/nextcloud/databasedump

kyle@alpha:/opt/nextcloud$ docker exec nextcloud-db psql -U nextcloud --set ON_ERROR_STOP=on -f /var/lib/postgresql/data/db_dump_pgsql_nextcloud.sql
psql: FATAL:  database "nextcloud" does not exist

I am having an issue with properly droping the database and importing the new one. Advice?

Other steps not gotten too but self explanatory.

if the server named changed you have to edit config/config.php
start redis, php-fpm, nginx container.