🐛 [Immich] Unable to connect to the database

lachlanalston commented 1 year ago

Description

Any help would be greatly appreciated

Installed Immich addon using the default settings

sql server cant connect durring install

psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory Is the server running locally and accepting connections on that socket?

Then later on this loops

[Nest] 727 - 09/20/2023, 1:40:12 PM ERROR [TypeOrmModule] Unable to connect to the database. Retrying (1)... Error: connect ECONNREFUSED 172.30.32.1:5432 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16) It is highly recommended to use a minimum Redis version of 6.2.0 Current: 6.0.16

See full logs attached

Reproduction steps

1. Install Immich
2. Click Start
3. Go to logs
4. See error

Addon Logs

Starting...
/etc/cont-init.d/00-banner.sh: executing
-----------------------------------------------------------
 Add-on: Immich
 Self-hosted photo and video backup solution directly from your mobile phone
-----------------------------------------------------------
 Add-on version: 1.78.1
 You are running the latest version of this add-on.
 System: Home Assistant OS 10.5  (aarch64 / yellow)
 Home Assistant Core: 2023.9.2
 Home Assistant Supervisor: 2023.09.2
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums
-----------------------------------------------------------
 Provided by: https://github.com/alexbelgium/hassio-addons 
-----------------------------------------------------------
 Defining permissions for main user : 
User UID: 1000
User GID : 1000
-----------------------------------------------------------
/etc/cont-init.d/00-global_var.sh: executing
DB_DATABASE_NAME='immich'
DB_HOSTNAME='homeassistant.local'
DB_PASSWORD=******
DB_PORT='5432'
DB_USERNAME='postgres'
JWT_SECRET='REMOVED'
PGID='1000'
PUID='1000'
TYPESENSE_ENABLED='false'
TZ='Europe/Paris'
data_location='/share/immich'
/etc/cont-init.d/00-local_mounts.sh: executing
/etc/cont-init.d/00-smb_mounts.sh: executing
/etc/cont-init.d/01-custom_script.sh: executing
[13:39:50] INFO: Execute /config/addons_autoscripts/immich.sh if existing
[13:39:50] INFO: ... no script found
/etc/cont-init.d/20-folders.sh: executing
[13:39:51] INFO: Setting data location
... check /share/immich folder exists
... setting permissions
... correcting official script
/etc/cont-init.d/99-database.sh: executing
[21:39:51] WARNING: Your previous database was exported to /share/postgresql_immich.tar.gz
[21:39:51] INFO: Defining database
[21:39:51] INFO: -----------------
[21:39:51] INFO: Connecting to external postgresql
[21:39:51] INFO: 
psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
    Is the server running locally and accepting connections on that socket?
Error : /etc/cont-init.d/99-database.sh exiting 2
/etc/cont-init.d/99-deprecation: executing
╔════════════════════════════════════════════════════╗
╠════════════════════════════════════════════════════╣
║                                                    ║
║             This image is deprecated.              ║
║      We will not offer support for this image      ║
║            and it will not be updated.             ║
║                                                    ║
╠════════════════════════════════════════════════════╣
╚════════════════════════════════════════════════════╝
Due to versioning issues, the jammy branch is deprecated.
══════════════════════════════════════════════════════
/etc/cont-init.d/99-run.sh: executing
[13:39:53] INFO: Setting variables
DB_DATABASE_NAME='immich'
DB_HOSTNAME='homeassistant.local'
DB_PASSWORD=******
DB_PORT='5432'
DB_USERNAME='postgres'

JWT_SECRET='REMOVED'
PGID='1000'
PUID='1000'
TYPESENSE_ENABLED='false'
TZ='Europe/Paris'
data_location='/share/immich'
[13:39:56] INFO: Defining database
[13:39:56] INFO: -----------------
[13:39:56] INFO: Using internal postgresql
[13:39:56] INFO: 
 * Starting PostgreSQL 14 database server
   ...done.
ERROR:  role "root" already exists
ERROR:  database "immich" already exists
ERROR:  role "immich" already exists
GRANT
[13:40:00] INFO: Starting redis

Starting the upstream container

555:C 20 Sep 2023 13:40:00.661 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
555:C 20 Sep 2023 13:40:00.661 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=555, just started
555:C 20 Sep 2023 13:40:00.661 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
555:M 20 Sep 2023 13:40:00.669 * Running mode=standalone, port=6379.
555:M 20 Sep 2023 13:40:00.670 # Server initialized
555:M 20 Sep 2023 13:40:00.670 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
555:M 20 Sep 2023 13:40:00.680 * Ready to accept connections
[migrations] started
[migrations] no migrations found
╔═══════════════════════════════╗
       __  _____ _____       __
      / / |_   _/ ____|     / /
     / /    | || |  __     / /
    / /     | || | |_ |   / /
   / /     _| || |__| |  / /
  /_/     |_____\_____| /_/
  Baseimage from linuxserver.io
╠═══════════════════════════════╣
  To support this applications developer(s) visit:
  Immich: https://immich.app/docs/overview/support-the-project
╠═══════════════════════════════╣
  User/Group ID:
  User UID: 1000
  User GID: 1000
╚═══════════════════════════════╝
[custom-init] No custom files found, skipping...
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [NestFactory] Starting Nest application...
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +172ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] BullModule dependencies initialized +0ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] ConfigHostModule dependencies initialized +3ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] DiscoveryModule dependencies initialized +1ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] ConfigModule dependencies initialized +23ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] ScheduleModule dependencies initialized +0ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] BullModule dependencies initialized +1ms
[Nest] 727  - 09/20/2023, 1:40:12 PM     LOG [InstanceLoader] BullModule dependencies initialized +1ms
[Nest] 727  - 09/20/2023, 1:40:12 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (1)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
It is highly recommended to use a minimum Redis version of 6.2.0
           Current: 6.0.16
[Nest] 727  - 09/20/2023, 1:40:15 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (2)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:18 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (3)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:21 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (4)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:24 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (5)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:27 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (6)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:30 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (7)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:33 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (8)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:36 PM   ERROR [TypeOrmModule] Unable to connect to the database. Retrying (9)...
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)
[Nest] 727  - 09/20/2023, 1:40:36 PM   ERROR [ExceptionHandler] connect ECONNREFUSED 172.30.32.1:5432
Error: connect ECONNREFUSED 172.30.32.1:5432
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16)

mikedrawback commented 1 year ago

I had this same issue too, thanks for reporting

alexbelgium commented 1 year ago

The image is deprecated it means we need to switch to a new one that doesn't have embedded postgres so the database will be lost... And the new postgres add-on must be downloaded and used for immich to work. This is quite inconvenient but the way the image on which my add-on is based is now defined

lachlanalston commented 1 year ago

Thanks for the quick response.

Seems to somewhat be working now, the web gui works for a 15 seconds or so and then crashes and gets this error on the web interface:

connect ECONNREFUSED 127.0.0.1:3001

If i then wait another 10 seconds or so and refresh the normal immich interface comes up and then repeats.

These are the errors from the logs and the errors will loop

Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1495:16) {
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 6379
}
[Nest] 604  - 09/21/2023, 12:43:25 PM   ERROR [RedisIoAdapter] Redis subClient: Error: connect ECONNREFUSED 127.0.0.1:6379
[Nest] 604  - 09/21/2023, 12:43:25 PM   ERROR [RedisIoAdapter] Redis pubClient: Error: connect ECONNREFUSED 127.0.0.1:6379
[Nest] 604  - 09/21/2023, 12:43:25 PM   ERROR [RedisIoAdapter] Redis subClient: Error: connect ECONNREFUSED 127.0.0.1:6379
[Nest] 604  - 09/21/2023, 12:43:25 PM   ERROR [RedisIoAdapter] Redis pubClient: Error: connect ECONNREFUSED 127.0.0.1:6379
[Nest] 604  - 09/21/2023, 12:43:26 PM   ERROR [RedisIoAdapter] Redis subClient: Error: connect ECONNREFUSED 127.0.0.1:6379
[Nest] 604  - 09/21/2023, 12:43:26 PM   ERROR [RedisIoAdapter] Redis pubClient: Error: connect ECONNREFUSED 127.0.0.1:6379

If i leave it looping for 5 minutes it eventually spits this out as well:

/app/immich/server/node_modules/ioredis/built/redis/event_handler.js:182
                    self.flushQueue(new errors_1.MaxRetriesPerRequestError(maxRetriesPerRequest));
                                    ^
MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
    at Socket.<anonymous> (/app/immich/server/node_modules/ioredis/built/redis/event_handler.js:182:37)
    at Object.onceWrapper (node:events:629:26)
    at Socket.emit (node:events:514:28)
    at TCP.<anonymous> (node:net:323:12)
Node.js v18.17.1

error2 error1

Any help would be greatly appreciated

alexbelgium commented 1 year ago

Thanks, that must be a bug with the new image ; I'll try to look as soon as possible

gandhimaulik commented 1 year ago

@alexbelgium Still I am not able to make it work with external postgres addon.

I an getting error of below error on startup

/etc/cont-init.d/99-database.sh: executing
[23:00:55] WARNING: Your previous database was exported to /share/postgresql_immich.tar.gz
[23:00:55] INFO: Defining database
[23:00:55] INFO: -----------------
[23:00:55] INFO: Connecting to external postgresql
[23:00:55] INFO: 
chown: invalid user: ‘postgres’
*Error* : /etc/cont-init.d/99-database.sh exiting 1
/etc/cont-init.d/99-run.sh: executing

looking into code, I see in database setup file still assigns a unix username 'postgres' which does not exists immich/rootfs/etc/cont-init.d/99-database.sh

if we don't have postgres in image, how do we create database at first place?

Eventually database setup fails and addon does not start even for external postgres.

alexbelgium commented 1 year ago

I'll try something in a newly pushed test version

alexbelgium commented 1 year ago

v5 tested to work on my system

lachlanalston commented 1 year ago

Thanks for the quick response. I can confirm that v5 is working for me as well

gandhimaulik commented 1 year ago

@alexbelgium Thanks, Works fine now.

gandhimaulik commented 1 year ago

@alexbelgium On the same version, I am getting errors constantly as below and machine learning features are not working.

I see /data/machine-learning is created and given UID permission (in folders.sh) but immich is checking /config/machine-learning (default path).

[2023-09-22 22:11:06 +0530] [1308] [CRITICAL] WORKER TIMEOUT (pid:16401)
[2023-09-22 22:11:07 +0530] [1308] [ERROR] Worker (pid:16401) was sent SIGKILL! Perhaps out of memory?
[2023-09-22 22:11:07 +0530] [16452] [INFO] Booting worker with pid: 16452
There was a problem when trying to write in your cache folder (/config/machine-learning). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.

alexbelgium commented 1 year ago

Pushed ; please let me know

gandhimaulik commented 1 year ago

Pushed ; please let me know

Thanks. That Error is gone now.

but still it is not working. now i am not sure if it is issue in addon or base image or HW ( i am running rpi4 - 2G RAM, 4G swap on SSD - cpu consumption is 1 core 100% while jobs are running).

[2023-09-22 23:48:13 +0530] [2767] [INFO] Booting worker with pid: 2767
[2023-09-22 23:48:43 +0530] [1305] [CRITICAL] WORKER TIMEOUT (pid:2767)
[2023-09-22 23:48:45 +0530] [1305] [ERROR] Worker (pid:2767) was sent SIGKILL! Perhaps out of memory?
[2023-09-22 23:48:45 +0530] [2821] [INFO] Booting worker with pid: 2821
[2023-09-22 23:49:15 +0530] [1305] [CRITICAL] WORKER TIMEOUT (pid:2821)
[2023-09-22 23:49:16 +0530] [1305] [ERROR] Worker (pid:2821) was sent SIGKILL! Perhaps out of memory?
[2023-09-22 23:49:16 +0530] [2873] [INFO] Booting worker with pid: 2873
[2023-09-22 23:49:46 +0530] [1305] [CRITICAL] WORKER TIMEOUT (pid:2873)
[Nest] 1312  - 09/22/2023, 11:49:47 PM   ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): TypeError: fetch failed
[Nest] 1312  - 09/22/2023, 11:49:47 PM   ERROR [JobService] TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at async MachineLearningRepository.post (/app/immich/server/dist/infra/repositories/machine-learning.repository.js:27:21)
    at async FacialRecognitionService.handleRecognizeFaces (/app/immich/server/dist/domain/facial-recognition/facial-recognition.services.js:105:23)
    at async /app/immich/server/dist/domain/job/job.service.js:107:37
    at async Worker.processJob (/app/immich/server/node_modules/bullmq/dist/cjs/classes/worker.js:346:28)
    at async Worker.retryIfFailed (/app/immich/server/node_modules/bullmq/dist/cjs/classes/worker.js:531:24)
[Nest] 1312  - 09/22/2023, 11:49:47 PM   ERROR [JobService] Object:
{
  "id": "a6a19d7f-a8b1-4b3e-9e2a-88fd0babc959"
}

alexbelgium commented 1 year ago

Pushed ; please let me know Thanks. That Error is gone now.

Hi, the requirement is at least 4go ram it seems… https://documentation.immich.app/docs/install/requirements ; perhaps you could ask in the immich GitHub repo otherwise

gandhimaulik commented 1 year ago

Hi, the requirement is at least 4go ram it seems… https://documentation.immich.app/docs/install/requirements ; perhaps you could ask in the immich GitHub repo otherwise

Thank you very much for helping out. Looking further, it looks fimiliar with fix(ml): set higher default worker timeout they have added timeout to 120sec instead of 30s default to their docker compose way. I see we are using single docker solution, not sure where to pass this value.

On parallel I tried installing immich using docker compose outside HA (on same HW) and it is working fine with machine learning things working too. (Now Home Assistant complains about I am running other software outside HA in docker but I can ignore that)

Update: found a place where it should be added. Raised an issue in that repo Increase worker timout

alexbelgium commented 1 year ago

Yes, nicely seen! We can also modify the files through my dockerfile (with a sed command for example) but it's true that modifying the upstream container the best solution !

hydazz commented 1 year ago

Yes, nicely seen! We can also modify the files through my dockerfile (with a sed command for example) but it's true that modifying the upstream container the best solution !

we will export the MACHINE_LEARNING_WORKERS and MACHINE_LEARNING_WORKER_TIMEOUT for you to play with as needed

gandhimaulik commented 1 year ago

Thanks @hydazz, that makes sense. Let us know when PR can be merged into main.

gandhimaulik commented 1 year ago

Thanks @hydazz for merging it. @alexbelgium I guess you need to rebuild image of addon.

Thanks a ton both you.

alexbelgium commented 1 year ago

Thanks very much! Rebuild ongoing! New env are set as optional with default values of 1 worker and 120 seconds

gandhimaulik commented 1 year ago

Perfect. It works with all ML features. 🎉

alexbelgium commented 1 year ago

Thanks ! Then as both issues from this element (redis and ml) are confirmed working I'll close it! Thanks to all

alexbelgium / hassio-addons