brocaar / chirpstack-application-server

ChirpStack Application Server is an open-source LoRaWAN application-server.
https://www.chirpstack.io
MIT License
502 stars 325 forks source link

Too many open files upon periodic Network or Application server error #581

Closed omogenot closed 3 years ago

omogenot commented 3 years ago

What happened?

Apparently, if the network-server and/or the application-server detects an error (CUSTOM_JS Codec error, or DevEUI not found), the socket is not closed between these two. As a consequence, after some time, the link between the network-server and the application is impossible as there are too many connections. The system is then totally blocked and the user can not even log in the application server portal. This problem can be seen in the syslog. The DevEUI not found, by the way, shall not be an error. It is highly probable that your server might be located in a zone where other devices from other systems integrators/users are installed. Shan't the error message show the unknown DevEUI value so that one can make sure that this is not a forgotten device?

What did you expect?

The network-server and/or the application-server shall close the socket upon error detection. As a workaround I made a cron script restart the chirpstack-application-server service every 12 hours.

Steps to reproduce this issue

Steps:

  1. Make a faulty CUSTOM_JS codec for a device
  2. Wait at least 24 hours for the link to break, or use netstat utility to see all the unclosed sockets between the two processes

Could you share your log output?

/var/log/syslog: chirpstack-application-server[9060]: 2021/03/15 12:08:58 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 1s

chirpstack-network-server[9073]: time="2021-03-15T12:14:41+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=2af36290-440f-4292-a489-e2c6ec99d66d error="get device-session error: object does not exist"

chirpstack-network-server[9073]: time="2021-03-15T12:15:52+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=675c500b-cdb8-4799-bbf4-f30bff6be849 error="request home netid error: response error, code: UnknownDevEUI, description: deveui does not exist"

Your Environment

Linux amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux

Component Version
Application Server v3.14.0
Network Server v3.12.2
Gateway Bridge v3.10.0
Chirpstack API N/A
Geolocation N/A
Concentratord N/A
brocaar commented 3 years ago

I have not yet been able to reproduce the issue yet:

When the device sends an uplink, I do not see an increase in sockets. Please let me know if there is anything else I should try.

omogenot commented 3 years ago

Well, may be you can try with an unknown radio module that generates the following error:

chirpstack-network-server[9073]: time="2021-03-15T12:14:41+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=2af36290-440f-4292-a489-e2c6ec99d66d error="get device-session error: object does not exist"

That's all i can see that could relate to the problem of "too many opened files"...

brocaar commented 3 years ago

Also with an unknown device, I do not see an increase.

Could you post your netstat and / or lsof output when this issue occurs? I'm especially interested in the source and destination ports as that might give a hint in which part the connection is not closed properly, if that is the case.

omogenot commented 3 years ago

Here is a netstat -tn @ 18:50 (knowing that a crontab is restarting services @ 11:55 and 23:55)

tcp        0      0 127.0.0.1:34168         127.0.0.1:5432          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41406         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:40998         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:40976         ESTABLISHED
tcp        0      0 127.0.0.1:58032         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41302         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41080         ESTABLISHED
tcp        0      0 127.0.0.1:41252         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41016         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41074         ESTABLISHED
tcp        0      0 127.0.0.1:45236         127.0.0.1:1883          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41208         ESTABLISHED
tcp        0      0 127.0.0.1:41320         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:53776         127.0.0.1:1883          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41440         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41354         ESTABLISHED
tcp        0      0 127.0.0.1:41208         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41446         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41220         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41416         ESTABLISHED
tcp        0      0 127.0.0.1:41140         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41412         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41200         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41182         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41186         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41272         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41048         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41320         ESTABLISHED
tcp        0      0 127.0.0.1:5432          127.0.0.1:42748         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41446         ESTABLISHED
tcp        0      0 127.0.0.1:41306         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41058         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41032         ESTABLISHED
tcp        0      0 127.0.0.1:6379          127.0.0.1:57004         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41466         ESTABLISHED
tcp        0      0 127.0.0.1:45696         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:42708         127.0.0.1:5432          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41536         ESTABLISHED
tcp        0      0 127.0.0.1:41354         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:56286         127.0.0.1:6379          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:37546         ESTABLISHED
tcp        0    136 192.168.0.252:22        192.168.0.251:52975     ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41200         ESTABLISHED
tcp        0      0 127.0.0.1:36856         127.0.0.1:8000          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41242         ESTABLISHED
tcp        0      0 127.0.0.1:5432          127.0.0.1:34176         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41302         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41100         ESTABLISHED
tcp        0      0 127.0.0.1:41260         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41412         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41264         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41016         ESTABLISHED
tcp        0      0 127.0.0.1:57184         127.0.0.1:1883          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41182         ESTABLISHED
tcp        0      0 127.0.0.1:41536         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41440         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41148         ESTABLISHED
tcp        0      0 127.0.0.1:1883          127.0.0.1:45236         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41108         ESTABLISHED
tcp        0      0 127.0.0.1:41148         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41264         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41108         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:6379          127.0.0.1:56286         ESTABLISHED
tcp        0      0 127.0.0.1:41032         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41100         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:45204         ESTABLISHED
tcp        0      0 127.0.0.1:37546         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:1883          127.0.0.1:57184         ESTABLISHED
tcp        0      0 127.0.0.1:5432          127.0.0.1:42708         ESTABLISHED
tcp        0      0 127.0.0.1:41480         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:40998         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:46334         127.0.0.1:8001          ESTABLISHED
tcp        0      0 127.0.0.1:41074         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:1883          127.0.0.1:53776         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41252         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:45696         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41008         ESTABLISHED
tcp        0      0 127.0.0.1:41160         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:34176         127.0.0.1:5432          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41058         ESTABLISHED
tcp        0      0 127.0.0.1:41416         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41048         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41118         ESTABLISHED
tcp        0      0 127.0.0.1:8001          127.0.0.1:46334         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41220         ESTABLISHED
tcp        0      0 127.0.0.1:41186         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41140         ESTABLISHED
tcp        0      0 127.0.0.1:42748         127.0.0.1:5432          ESTABLISHED
tcp        0      0 127.0.0.1:5432          127.0.0.1:34168         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41128         ESTABLISHED
tcp        0      0 127.0.0.1:41360         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41272         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:58032         ESTABLISHED
tcp        0      0 127.0.0.1:8000          127.0.0.1:36856         ESTABLISHED
tcp        0      0 127.0.0.1:41080         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41360         ESTABLISHED
tcp        0      0 127.0.0.1:57004         127.0.0.1:6379          ESTABLISHED
tcp        0      0 127.0.0.1:41466         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:45204         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41128         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41008         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41306         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41480         ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41260         ESTABLISHED
tcp        0      0 127.0.0.1:41372         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41242         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41372         ESTABLISHED
tcp        0      0 127.0.0.1:40976         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:8003          127.0.0.1:41160         ESTABLISHED
tcp        0      0 127.0.0.1:41406         127.0.0.1:8003          ESTABLISHED
tcp        0      0 127.0.0.1:41118         127.0.0.1:8003          ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40060               ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40046               ESTABLISHED
tcp6       0      0 ::1:40054               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40056               ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40040               ESTABLISHED
tcp6       0      0 ::1:40040               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40048               ESTABLISHED
tcp6       0      0 ::1:40056               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40042               ESTABLISHED
tcp6       0      0 ::1:40052               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:40048               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:40042               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:40044               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:40038               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40050               ESTABLISHED
tcp6       0      0 ::1:40060               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40052               ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40038               ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40044               ESTABLISHED
tcp6       0      0 ::1:40046               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40054               ESTABLISHED
tcp6       0      0 ::1:40058               ::1:8080                ESTABLISHED
tcp6       0      0 ::1:8080                ::1:40058               ESTABLISHED
tcp6       0      0 ::1:40050               ::1:8080                ESTABLISHED
brocaar commented 3 years ago

Thanks @omogenot that is helpful, I see port 8003 occurring a lot, which is the ChirpStack Application Server Join Server API. That gives me a bit more direction at what part to look for a possible issue.

ddavidmelo commented 3 years ago

Same problem over here. After upgrading to v3.14.0 these errors started to appear: http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 1 http: Accept error: accept tcp [::]:8003: accept4: too many open files; retrying in 1

brocaar commented 3 years ago

I'm not sure if this is related to the AS v3.14, but then it also depends from which version you are migrating from :)

Anyway, I did something interesting in Go which might potentially cause this issue:

package main

import (
    "fmt"
    "html"
    "io"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

var (
    mutex sync.Mutex
    count int
)

func startWebserver() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
    })

    go http.ListenAndServe(":9092", nil)
}

func startLoadTest() {
    for {
        func() {
            resp, err := http.Get("http://localhost:9092/")
            if err != nil {
                panic(fmt.Sprintf("Got error: %v", err))
            }
            defer resp.Body.Close()
            if err != nil {
                panic(err)
            }

            io.Copy(ioutil.Discard, resp.Body)

            mutex.Lock()
            defer mutex.Unlock()
            count += 1
        }()
    }
}

func main() {
    startWebserver()
    for i := 0; i < 1; i++ {
        go startLoadTest()
    }
    <-time.After(10 * time.Second)
    fmt.Println("request count: ", count)
    time.Sleep(time.Minute)
}

vs

package main

import (
    "fmt"
    "html"
    "io"
    "io/ioutil"
    "net/http"
    "sync"
    "time"
)

var (
    mutex sync.Mutex
    count int
)

func startWebserver() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
    })

    go http.ListenAndServe(":9092", nil)
}

func startLoadTest() {
    for {
        func() {
            resp, err := http.Get("http://localhost:9092/")
            if err != nil {
                panic(fmt.Sprintf("Got error: %v", err))
            }
            defer resp.Body.Close()
            if err != nil {
                panic(err)
            }

            // io.Copy(ioutil.Discard, resp.Body)

            mutex.Lock()
            defer mutex.Unlock()
            count += 1
        }()
    }
}

func main() {
    startWebserver()
    for i := 0; i < 1; i++ {
        go startLoadTest()
    }
    <-time.After(10 * time.Second)
    fmt.Println("request count: ", count)
    time.Sleep(time.Minute)
}

In the second case, the response body is not consumed (completely). This should be perfectly fine, the important thing is that Body.Close() is called (per Go documentation). The behavior between these two code examples is completely different however. The performance without io.Copy(ioutil.Discard, resp.Body) is ~ 1/3 of the version with io.Copy(ioutil.Discard, resp.Body). I took this code example from https://forum.golangbridge.org/t/do-i-need-to-read-the-body-before-close-it/5594/6, where it was also related to "too many open files".

Back to ChirpStack, in all cases Body.Close() is called for http responses (and if not, then this must be fixed), but there are cases where the Body is never consumed (as it is not needed). What I will do is make sure that the response Body is completely consumed before Body.Close() is called. Hopefully this will resolve this issue.

brocaar commented 3 years ago

I'm closing this issue as v3.15 has been released with the above improvements. Please let me know if you continue to experience this issue.