element-hq / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://element-hq.github.io/dendrite/
GNU Affero General Public License v3.0
10 stars 3 forks source link

user api stops working #2105

Open matrixbot opened 2 weeks ago

matrixbot commented 2 weeks ago

This issue was originally created by @Joshix-1 at https://github.com/matrix-org/dendrite/issues/2105.

Background information

Description

Steps to reproduce

It just happens all the time, every few minutes. Sometimes after half an hour.

matrixbot commented 2 weeks ago

This comment was originally posted by @kegsay at https://github.com/matrix-org/dendrite/issues/2105#issuecomment-1020302268.

How do you know it is the userapi if you're running in Monolith mode?

matrixbot commented 2 weeks ago

This comment was originally posted by @Joshix-1 at https://github.com/matrix-org/dendrite/issues/2105#issuecomment-1023545793.

We have a script running that checks periodically if the get public rooms endpoint still responds. And after a while it stops responding or times out. The script we use:

#!/usr/bin/env python3

import xmlrpc.client
import time

import urllib3

urllib3.disable_warnings()

HOMESERVER = "localhost:8448"
SUPERVISOR = "localhost:9001"
XMLRPC_USER = "ReoNa"
XMLRPC_PASS = "forget-me-not"

if __name__ == "__main__":
    counter = 0
    running = False
    server = xmlrpc.client.ServerProxy(f"http://{XMLRPC_USER}:{XMLRPC_PASS}@{SUPERVISOR}/RPC2")
    http = urllib3.PoolManager(timeout=5.0, cert_reqs = "CERT_NONE")
    while True:
        try:
            check_start_time = time.monotonic()
            http.request("GET", f"https://{HOMESERVER}/_matrix/client/r0/publicRooms?limit=10")
            counter = 0
            running = True
            time.sleep(10)
        except urllib3.exceptions.MaxRetryError:
            if not running:
                if time.monotonic() - check_start_time >= 20:
                    counter += 1
                else:
                    time.sleep(5)
                    counter = 0
                if counter < 6:
                    continue
            counter = 0
            running = False
            process_info = server.supervisor.getProcessInfo("dendrite")
            if process_info["statename"] != "RUNNING": continue
            uptime = process_info["now"] - process_info["start"]
            server.supervisor.stopProcess("dendrite")
            server.supervisor.startProcess("dendrite")