Closed omogenot closed 3 years ago
I have not yet been able to reproduce the issue yet:
watch lsof -p <PID OF chirpstack-network-server> -n -P
watch lsof -p <PID OF chirpstack-application-server> -n -P
watch 'netstat -p |grep chirp'
When the device sends an uplink, I do not see an increase in sockets. Please let me know if there is anything else I should try.
Well, may be you can try with an unknown radio module that generates the following error:
chirpstack-network-server[9073]: time="2021-03-15T12:14:41+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=2af36290-440f-4292-a489-e2c6ec99d66d error="get device-session error: object does not exist"
That's all i can see that could relate to the problem of "too many opened files"...
Also with an unknown device, I do not see an increase.
Could you post your netstat
and / or lsof
output when this issue occurs? I'm especially interested in the source and destination ports as that might give a hint in which part the connection is not closed properly, if that is the case.
Here is a netstat -tn @ 18:50 (knowing that a crontab is restarting services @ 11:55 and 23:55)
tcp 0 0 127.0.0.1:34168 127.0.0.1:5432 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41406 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:40998 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:40976 ESTABLISHED
tcp 0 0 127.0.0.1:58032 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41302 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41080 ESTABLISHED
tcp 0 0 127.0.0.1:41252 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41016 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41074 ESTABLISHED
tcp 0 0 127.0.0.1:45236 127.0.0.1:1883 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41208 ESTABLISHED
tcp 0 0 127.0.0.1:41320 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:53776 127.0.0.1:1883 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41440 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41354 ESTABLISHED
tcp 0 0 127.0.0.1:41208 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41446 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41220 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41416 ESTABLISHED
tcp 0 0 127.0.0.1:41140 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41412 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41200 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41182 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41186 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41272 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41048 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41320 ESTABLISHED
tcp 0 0 127.0.0.1:5432 127.0.0.1:42748 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41446 ESTABLISHED
tcp 0 0 127.0.0.1:41306 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41058 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41032 ESTABLISHED
tcp 0 0 127.0.0.1:6379 127.0.0.1:57004 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41466 ESTABLISHED
tcp 0 0 127.0.0.1:45696 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:42708 127.0.0.1:5432 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41536 ESTABLISHED
tcp 0 0 127.0.0.1:41354 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:56286 127.0.0.1:6379 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:37546 ESTABLISHED
tcp 0 136 192.168.0.252:22 192.168.0.251:52975 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41200 ESTABLISHED
tcp 0 0 127.0.0.1:36856 127.0.0.1:8000 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41242 ESTABLISHED
tcp 0 0 127.0.0.1:5432 127.0.0.1:34176 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41302 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41100 ESTABLISHED
tcp 0 0 127.0.0.1:41260 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41412 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41264 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41016 ESTABLISHED
tcp 0 0 127.0.0.1:57184 127.0.0.1:1883 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41182 ESTABLISHED
tcp 0 0 127.0.0.1:41536 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41440 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41148 ESTABLISHED
tcp 0 0 127.0.0.1:1883 127.0.0.1:45236 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41108 ESTABLISHED
tcp 0 0 127.0.0.1:41148 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41264 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41108 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:6379 127.0.0.1:56286 ESTABLISHED
tcp 0 0 127.0.0.1:41032 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41100 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:45204 ESTABLISHED
tcp 0 0 127.0.0.1:37546 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:1883 127.0.0.1:57184 ESTABLISHED
tcp 0 0 127.0.0.1:5432 127.0.0.1:42708 ESTABLISHED
tcp 0 0 127.0.0.1:41480 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:40998 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:46334 127.0.0.1:8001 ESTABLISHED
tcp 0 0 127.0.0.1:41074 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:1883 127.0.0.1:53776 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41252 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:45696 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41008 ESTABLISHED
tcp 0 0 127.0.0.1:41160 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:34176 127.0.0.1:5432 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41058 ESTABLISHED
tcp 0 0 127.0.0.1:41416 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41048 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41118 ESTABLISHED
tcp 0 0 127.0.0.1:8001 127.0.0.1:46334 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41220 ESTABLISHED
tcp 0 0 127.0.0.1:41186 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41140 ESTABLISHED
tcp 0 0 127.0.0.1:42748 127.0.0.1:5432 ESTABLISHED
tcp 0 0 127.0.0.1:5432 127.0.0.1:34168 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41128 ESTABLISHED
tcp 0 0 127.0.0.1:41360 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41272 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:58032 ESTABLISHED
tcp 0 0 127.0.0.1:8000 127.0.0.1:36856 ESTABLISHED
tcp 0 0 127.0.0.1:41080 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41360 ESTABLISHED
tcp 0 0 127.0.0.1:57004 127.0.0.1:6379 ESTABLISHED
tcp 0 0 127.0.0.1:41466 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:45204 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41128 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41008 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41306 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41480 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41260 ESTABLISHED
tcp 0 0 127.0.0.1:41372 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41242 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41372 ESTABLISHED
tcp 0 0 127.0.0.1:40976 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:8003 127.0.0.1:41160 ESTABLISHED
tcp 0 0 127.0.0.1:41406 127.0.0.1:8003 ESTABLISHED
tcp 0 0 127.0.0.1:41118 127.0.0.1:8003 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40060 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40046 ESTABLISHED
tcp6 0 0 ::1:40054 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40056 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40040 ESTABLISHED
tcp6 0 0 ::1:40040 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40048 ESTABLISHED
tcp6 0 0 ::1:40056 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40042 ESTABLISHED
tcp6 0 0 ::1:40052 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:40048 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:40042 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:40044 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:40038 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40050 ESTABLISHED
tcp6 0 0 ::1:40060 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40052 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40038 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40044 ESTABLISHED
tcp6 0 0 ::1:40046 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40054 ESTABLISHED
tcp6 0 0 ::1:40058 ::1:8080 ESTABLISHED
tcp6 0 0 ::1:8080 ::1:40058 ESTABLISHED
tcp6 0 0 ::1:40050 ::1:8080 ESTABLISHED
Thanks @omogenot that is helpful, I see port 8003
occurring a lot, which is the ChirpStack Application Server Join Server API. That gives me a bit more direction at what part to look for a possible issue.
Same problem over here. After upgrading to v3.14.0 these errors started to appear: http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 1 http: Accept error: accept tcp [::]:8003: accept4: too many open files; retrying in 1
I'm not sure if this is related to the AS v3.14, but then it also depends from which version you are migrating from :)
Anyway, I did something interesting in Go which might potentially cause this issue:
package main
import (
"fmt"
"html"
"io"
"io/ioutil"
"net/http"
"sync"
"time"
)
var (
mutex sync.Mutex
count int
)
func startWebserver() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
})
go http.ListenAndServe(":9092", nil)
}
func startLoadTest() {
for {
func() {
resp, err := http.Get("http://localhost:9092/")
if err != nil {
panic(fmt.Sprintf("Got error: %v", err))
}
defer resp.Body.Close()
if err != nil {
panic(err)
}
io.Copy(ioutil.Discard, resp.Body)
mutex.Lock()
defer mutex.Unlock()
count += 1
}()
}
}
func main() {
startWebserver()
for i := 0; i < 1; i++ {
go startLoadTest()
}
<-time.After(10 * time.Second)
fmt.Println("request count: ", count)
time.Sleep(time.Minute)
}
vs
package main
import (
"fmt"
"html"
"io"
"io/ioutil"
"net/http"
"sync"
"time"
)
var (
mutex sync.Mutex
count int
)
func startWebserver() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello, %q", html.EscapeString(r.URL.Path))
})
go http.ListenAndServe(":9092", nil)
}
func startLoadTest() {
for {
func() {
resp, err := http.Get("http://localhost:9092/")
if err != nil {
panic(fmt.Sprintf("Got error: %v", err))
}
defer resp.Body.Close()
if err != nil {
panic(err)
}
// io.Copy(ioutil.Discard, resp.Body)
mutex.Lock()
defer mutex.Unlock()
count += 1
}()
}
}
func main() {
startWebserver()
for i := 0; i < 1; i++ {
go startLoadTest()
}
<-time.After(10 * time.Second)
fmt.Println("request count: ", count)
time.Sleep(time.Minute)
}
In the second case, the response body is not consumed (completely). This should be perfectly fine, the important thing is that Body.Close()
is called (per Go documentation). The behavior between these two code examples is completely different however. The performance without io.Copy(ioutil.Discard, resp.Body)
is ~ 1/3 of the version with io.Copy(ioutil.Discard, resp.Body)
. I took this code example from https://forum.golangbridge.org/t/do-i-need-to-read-the-body-before-close-it/5594/6, where it was also related to "too many open files".
Back to ChirpStack, in all cases Body.Close()
is called for http responses (and if not, then this must be fixed), but there are cases where the Body
is never consumed (as it is not needed). What I will do is make sure that the response Body
is completely consumed before Body.Close()
is called. Hopefully this will resolve this issue.
I'm closing this issue as v3.15 has been released with the above improvements. Please let me know if you continue to experience this issue.
What happened?
Apparently, if the network-server and/or the application-server detects an error (CUSTOM_JS Codec error, or DevEUI not found), the socket is not closed between these two. As a consequence, after some time, the link between the network-server and the application is impossible as there are too many connections. The system is then totally blocked and the user can not even log in the application server portal. This problem can be seen in the syslog. The DevEUI not found, by the way, shall not be an error. It is highly probable that your server might be located in a zone where other devices from other systems integrators/users are installed. Shan't the error message show the unknown DevEUI value so that one can make sure that this is not a forgotten device?
What did you expect?
The network-server and/or the application-server shall close the socket upon error detection. As a workaround I made a cron script restart the
chirpstack-application-server
service every 12 hours.Steps to reproduce this issue
Steps:
netstat
utility to see all the unclosed sockets between the two processesCould you share your log output?
/var/log/syslog:
chirpstack-application-server[9060]: 2021/03/15 12:08:58 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 1s
chirpstack-network-server[9073]: time="2021-03-15T12:14:41+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=2af36290-440f-4292-a489-e2c6ec99d66d error="get device-session error: object does not exist"
chirpstack-network-server[9073]: time="2021-03-15T12:15:52+01:00" level=error msg="uplink: processing uplink frame error" ctx_id=675c500b-cdb8-4799-bbf4-f30bff6be849 error="request home netid error: response error, code: UnknownDevEUI, description: deveui does not exist"
Your Environment
Linux amd64 #1 SMP Debian 4.9.88-1+deb9u1~bpo8+1 (2018-05-13) x86_64 GNU/Linux