AVENTER-UG / docker-matrix

docker image for matrix.org
https://riot.im/app/#/room/#dockermatrix:matrix.aventer.biz
GNU General Public License v2.0
96 stars 19 forks source link

synapse newer than 1.64 doesn't start #59

Closed worldowner closed 1 year ago

worldowner commented 2 years ago

I start synapse using avhost/docker-matrix as a service:

docker service create \
    --name synapse \
    --publish published=3478,target=3478,protocol=tcp \
    --publish published=3478,target=3478,protocol=udp \
    --publish published=8448,target=8448 \
    --mount type=volume,src=synapse-data,dst=/data \
    --mount type=bind,src=/etc/localtime,dst=/etc/localtime,ro \
    --mount type=volume,src=certbot-etc-letsencrypt,dst=/etc/letsencrypt,ro \
    --env REPORT_STATS=no \
    --env SERVER_NAME="MYDOMAIN.TLD" \
    --network matrix \
    avhost/docker-matrix:v1.68.0

Any image newer than 1.64.0 doesn't seem to try to start synapse:

# docker logs synapse.1.mdk3la8v8urmihcrlnz3umkyf
-=> start turn
-=> start matrix
Cannot create pid file: /var/run/turnserver.pid: Permission denied
Cannot create relay thread
: Operation not permitted
0: : Black listing: 172.16.0.0-172.31.255.255
0: : log file opened: /data/turnserver_2022-10-17.log
0: : 
RFC 3489/5389/5766/5780/6062/6156 STUN/TURN Server
Version Coturn-4.5.2 'dan Eider'
0: : 
Max number of open files/sockets allowed for this process: 1048576
0: : 
Due to the open files/sockets limitation,
max supported number of TURN Sessions possible is: 524000 (approximately)
0: : 

==== Show him the instruments, Practical Frost: ====

0: : TLS supported
0: : DTLS supported
0: : DTLS 1.2 supported
0: : TURN/STUN ALPN supported
0: : Third-party authorization (oAuth) supported
0: : GCM (AEAD) supported
0: : OpenSSL compile-time version: OpenSSL 3.0.3 3 May 2022 (0x30000030)
0: : 
0: : SQLite supported, default database location is /var/lib/turn/turndb
0: : Redis supported
0: : PostgreSQL supported
0: : MySQL supported
0: : MongoDB is not supported
0: : 
0: : Default Net Engine version: 3 (UDP thread per CPU core)

=====================================================

0: : Domain name: 
0: : Default realm: turn.MYDOMAIN.TLD
0: : 
CONFIGURATION ALERT: You specified --lt-cred-mech and --use-auth-secret in the same time.
Be aware that you could not mix the username/password and the shared secret based auth methods. 
Shared secret overrides username/password based auth method. Check your configuration!
0: : SSL23: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : SSL23: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.0: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.0: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.1: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.1: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS1.2: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : TLS1.2: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : TLS cipher suite: HIGH
0: : DTLS: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : DTLS: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : DTLS1.2: Certificate file found: /etc/letsencrypt/live/MYDOMAIN.TLD/fullchain.pem
0: : DTLS1.2: Private key file found: /etc/letsencrypt/live/MYDOMAIN.TLD/privkey.pem
0: : DTLS cipher suite: HIGH
0: : NO EXPLICIT LISTENER ADDRESS(ES) ARE CONFIGURED
0: : ===========Discovering listener addresses: =========
0: : Listener address to use: 127.0.0.1
0: : Listener address to use: 10.255.0.30
0: : Listener address to use: 172.18.0.7
0: : Listener address to use: 10.0.1.34
0: : =====================================================
0: : Total: 3 'real' addresses discovered
0: : =====================================================
0: : NO EXPLICIT RELAY ADDRESS(ES) ARE CONFIGURED
0: : ===========Discovering relay addresses: =============
0: : Relay address to use: 10.255.0.30
0: : Relay address to use: 172.18.0.7
0: : Relay address to use: 10.0.1.34
0: : =====================================================
0: : Total: 3 relay addresses discovered
0: : =====================================================
0: : Cannot create pid file: /var/run/turnserver.pid
0: : pid file created: /var/tmp/turnserver.pid
0: : IO method (main listener thread): epoll (with changelist)
0: : Wait for relay ports initialization...
0: :   relay 10.255.0.30 initialization...
0: :   relay 10.255.0.30 initialization done
0: :   relay 172.18.0.7 initialization...
0: :   relay 172.18.0.7 initialization done
0: :   relay 10.0.1.34 initialization...
0: :   relay 10.0.1.34 initialization done
0: : Relay ports initialization done

1.64.0 works perfectly fine.

andreaspeters commented 2 years ago

Thanks for your issue. One of your error messages is Due to the open files/sockets limitation and you use coturn, I want ask you if you can disable this service.

worldowner commented 2 years ago

I'm not sure how to disable coturn in your image without rebuilding it. README doesn't mention any option to control that. Also this message looks like a warning and "max supported number of TURN Sessions possible is: 524000 (approximately)" is more than enough. In fact I don't even use it, my synapse is configured to use different TURN server so I'd be happy to get rid of it :)

andreaspeters commented 2 years ago

Thats good, then we can just remove the coturn config file and remove the ports forward in the docker command. I only want to be sure that it's not an coturn issue. To be honest, no one of us contributors use coturn in this image we took over from Silvio. We just keep it inside for compatibility reasons. But if that one start to mess up something. It's a good time to remove it. :sweat_smile:

worldowner commented 2 years ago

Deleting config file and removing ports forward doesn't prevent coturn from starting :)

BTW. someone reported a problem with coturn version that you use in the image (4.5.2) that looks the same as I encounter: https://github.com/coturn/coturn/issues/683

andreaspeters commented 2 years ago

Thats right, but it will run with default values. So, please remove the file and the forward ports.

andreaspeters commented 2 years ago

But the coturn PR is interesting. :+1:

worldowner commented 2 years ago

I tried running coturn with default values (by deleting my config file and removing port forwarding from docker service) and synapse still doesn't start.

andreaspeters commented 2 years ago

Messages the same as before? Next step would be, to share your homeserver.yaml config file and information about your host system (linux version, docker version). Please be sure that you remove credentials, tokens server name and IP's from the config file.

biberino commented 2 years ago

Same issue here, all Version above 1.64.0 don't start Synapse. I use Coturn. Last lines when starting the container:

synapse_1 | 0: : ===========Discovering listener addresses: ========= synapse_1 | 0: : Listener address to use: 127.0.0.1 synapse_1 | 0: : Listener address to use: 172.29.0.2 synapse_1 | 0: : ===================================================== synapse_1 | 0: : Total: 1 'real' addresses discovered synapse_1 | 0: : ===================================================== synapse_1 | 0: : NO EXPLICIT RELAY ADDRESS(ES) ARE CONFIGURED synapse_1 | 0: : ===========Discovering relay addresses: ============= synapse_1 | 0: : Relay address to use: 172.29.0.2 synapse_1 | 0: : ===================================================== synapse_1 | 0: : Total: 1 relay addresses discovered synapse_1 | 0: : ===================================================== synapse_1 | 0: : Cannot create pid file: /var/run/turnserver.pid synapse_1 | 0: : pid file created: /var/tmp/turnserver.pid synapse_1 | 0: : IO method (main listener thread): epoll (with changelist) synapse_1 | 0: : WARNING: I cannot support STUN CHANGE_REQUEST functionality because only one IP address is provided synapse_1 | 0: : Wait for relay ports initialization... synapse_1 | 0: : relay 172.29.0.2 initialization... synapse_1 | 0: : relay 172.29.0.2 initialization done synapse_1 | 0: : Relay ports initialization done matrix_synapse_1 exited with code 255

It looks like Coturn hangs at "Relay ports initialization done", or its initialization is completed and Synapse doesnt start. Rolling the container back to v1.64.0 and everything works again and I dont get any of the above Coturn logs.

andreaspeters commented 2 years ago

Ok, with the next release, I will add a feature to disable coturn and flag it as deprecated. I will also add these feature to version 1.69.x. Only to give you a working update path to the 1.7x version.

biberino commented 2 years ago

I looked into it, and it looks like the coturn server is fine. If i disable the turnserver in "start.sh", and only start synapse and remove the "&" to actually see the output, i get the following error from synapse:

synapse_1 | 2022-11-01 10:13:45,649 - synapse.federation.federation_server - 1385 - INFO - main - Registering federation query handler for 'profile' synapse_1 | 2022-11-01 10:13:45,650 - synapse.federation.federation_server - 1365 - INFO - main - Registering federation EDU handler for 'm.presence' synapse_1 | 2022-11-01 10:13:45,651 - synapse.federation.federation_server - 1365 - INFO - main - Registering federation EDU handler for 'm.typing' synapse_1 | 2022-11-01 10:13:45,652 - synapse.federation.federation_server - 1385 - INFO - main - Registering federation query handler for 'directory' synapse_1 | 2022-11-01 10:13:45,653 - twisted - 274 - INFO - main - Redirected stdout/stderr to logs synapse_1 | 2022-11-01 10:13:45,653 - synapse.app.homeserver - 167 - INFO - sentinel - Running synapse_1 | 2022-11-01 10:13:45,655 - synapse.app.homeserver - 29 - INFO - sentinel - Set file limit to: 1048576 synapse_1 | 2022-11-01 10:13:45,657 - twisted - 274 - CRITICAL - sentinel - Unhandled Error synapse_1 | Traceback (most recent call last): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/base.py", line 498, in fireEvent synapse_1 | DeferredList(beforeResults).addCallback(self._continueFiring) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 497, in addCallback synapse_1 | return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 477, in addCallbacks synapse_1 | self._runCallbacks() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 857, in _runCallbacks synapse_1 | current.result = callback( # type: ignore[misc] synapse_1 | --- --- synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/base.py", line 510, in _continueFiring synapse_1 | callable(*args, kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/enterprise/adbapi.py", line 237, in _start synapse_1 | return self.start() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/enterprise/adbapi.py", line 247, in start synapse_1 | self.threadpool.start() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start synapse_1 | self.adjustPoolsize() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize synapse_1 | self._team.grow(self.min - self.workers) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow synapse_1 | def createOneWorker(): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do synapse_1 | working.pop(0)() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker synapse_1 | worker = self._createWorker() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator synapse_1 | return ThreadWorker(startThread, Queue()) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init synapse_1 | startThread(work) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread synapse_1 | return threadFactory(target=target).start() synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start synapse_1 | _start_new_thread(self._bootstrap, ()) synapse_1 | builtins.RuntimeError: can't start new thread synapse_1 | synapse_1 | 2022-11-01 10:13:45,658 - synapse.metrics.background_process_metrics - 242 - ERROR - _handle_new_device_update_async-0 - Background process '_handle_new_device_update_async' threw an exception synapse_1 | Traceback (most recent call last): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/metrics/background_process_metrics.py", line 240, in run synapse_1 | return await func(*args, *kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/handlers/device.py", line 681, in _handle_new_device_update_async synapse_1 | rows = await self.store.get_uncoverted_outbound_room_pokes() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/databases/main/devices.py", line 1900, in get_uncoverted_outbound_room_pokes synapse_1 | return await self.db_pool.runInteraction( synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 882, in runInteraction synapse_1 | return await delay_cancellation(_runInteraction()) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 959, in send synapse_1 | raise result.value synapse_1 | RuntimeError: can't start new thread synapse_1 | 2022-11-01 10:13:45,660 - synapse.handlers.deactivate_account - 230 - INFO - user_parter_loop-0 - Starting user parter synapse_1 | 2022-11-01 10:13:45,660 - synapse.metrics.background_process_metrics - 242 - ERROR - user_parter_loop-0 - Background process 'user_parter_loop' threw an exception synapse_1 | Traceback (most recent call last): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/metrics/background_process_metrics.py", line 240, in run synapse_1 | return await func(args, kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/handlers/deactivate_account.py", line 233, in _user_parter_loop synapse_1 | user_id = await self.store.get_user_pending_deactivation() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/databases/main/registration.py", line 1264, in get_user_pending_deactivation synapse_1 | return await self.db_pool.simple_select_one_onecol( synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 1614, in simple_select_one_onecol synapse_1 | return await self.runInteraction( synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/storage/database.py", line 882, in runInteraction synapse_1 | return await delay_cancellation(_runInteraction()) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/internet/defer.py", line 959, in send synapse_1 | raise result.value synapse_1 | RuntimeError: can't start new thread synapse_1 | 2022-11-01 10:13:45,661 - synapse.app._base - 257 - CRITICAL - sentinel - Error during startup synapse_1 | Traceback (most recent call last): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 242, in wrapper synapse_1 | await cb(*args, *kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/homeserver.py", line 391, in start synapse_1 | await _base.start(hs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 407, in start synapse_1 | resolver_threadpool.start() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start synapse_1 | self.adjustPoolsize() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize synapse_1 | self._team.grow(self.min - self.workers) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow synapse_1 | def createOneWorker(): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do synapse_1 | working.pop(0)() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker synapse_1 | worker = self._createWorker() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator synapse_1 | return ThreadWorker(startThread, Queue()) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init synapse_1 | startThread(work) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread synapse_1 | return threadFactory(target=target).start() synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start synapse_1 | _start_new_thread(self._bootstrap, ()) synapse_1 | RuntimeError: can't start new thread synapse_1 | Error during startup: synapse_1 | Traceback (most recent call last): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 242, in wrapper synapse_1 | await cb(args, **kwargs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/homeserver.py", line 391, in start synapse_1 | await _base.start(hs) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/synapse/app/_base.py", line 407, in start synapse_1 | resolver_threadpool.start() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 158, in start synapse_1 | self.adjustPoolsize() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/python/threadpool.py", line 304, in adjustPoolsize synapse_1 | self._team.grow(self.min - self.workers) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 118, in grow synapse_1 | def createOneWorker(): synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 107, in do synapse_1 | working.pop(0)() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_team.py", line 120, in createOneWorker synapse_1 | worker = self._createWorker() synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 55, in limitedWorkerCreator synapse_1 | return ThreadWorker(startThread, Queue()) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_threadworker.py", line 49, in init synapse_1 | startThread(work) synapse_1 | File "/usr/local/lib/python3.10/dist-packages/twisted/_threads/_pool.py", line 49, in startThread synapse_1 | return threadFactory(target=target).start() synapse_1 | File "/usr/lib/python3.10/threading.py", line 935, in start synapse_1 | _start_new_thread(self._bootstrap, ()) synapse_1 | RuntimeError: can't start new thread

It looks like Synapse cant start new threads. This is only true for V1.65.0, V1.64.0 can start these threads! I dont have any systemd or docker limits enabeld (at least that I'm aware of).

andreaspeters commented 2 years ago

:thinking: Looks like a limits issue from your system. May I ask where/how you run docker-matrix?

biberino commented 2 years ago

Sure i run it on a Debian 10 Server, Docker version 20.10.3

image

with the following docker-compose.yaml:

    synapse:
            image: avhost/docker-matrix:v1.64.0
            ports:
                - "127.0.0.1:8008:8008"
                - "3478:3478"
            volumes:
                - /opt/matrix/synapse:/data
            restart: always

I dont see limits in systemd or docker itself. cat /proc/sys/kernel/threads-max gives me 128106 Number of currently running threads is 1342, this includes Synapse 1.64.0.

Are there other limits i need to check? And why is it that only from v1.65.0 on, that Synapse has suddenly problems spawning threads?

andreaspeters commented 2 years ago

Strange.... Yeah that's a good question. :man_shrugging: I do not have an answer right now and my own instance is also not affected by that. I will ask in our docker-matrix room if someone have an idea.

neurosys-zero commented 1 year ago

+1

andreaspeters commented 1 year ago

No feedback from the community. But I add the possibility to disable coturn. (docker tag v1.65.0_av1). If you set the environment COTURN_ENABLE="false" then it should be disable. Please try and give me feedback. Also do not forget to remove the exposed coturn ports from your docker command. Sorry, I have no other idea. :-(

neurosys-zero commented 1 year ago

Unfortunately this did not work for me :(

andreaspeters commented 1 year ago

:-( but thanks for feedback @neurosys-zero

neurosys-zero commented 1 year ago

Update: The host was running Debian 9 / Stretch I did a dist-update on her to 10 / Buster, updated the old docker to docker-ce and now any tags later than 1.64 run correctly. :)

andreaspeters commented 1 year ago

:partying_face: thats a great feedback. Thanks a lot @neurosys-zero :-D

worldowner commented 1 year ago

I can confirm that it works on Debian 10 or higher. I successfully upgraded to 1.84.1.

andreaspeters commented 1 year ago

@worldowner thanks for feedback.