Closed chmichael closed 1 month ago
Thanks for the feedback. I am not sure which specific local DNS proxy you are using but assuming that the QUIC upstream is your Technitium DNS server instance. Connection refused error just means that the port is not open on the server. So, that will need some debugging to see if the port is really open or if the dns process is running.
The other error log is unrelated to this. The error seems to indicate that the TCP/TLS requests was closed mid session so it failed to read the complete response. Not sure about the reasons as it could be anything including network issues.
Ok i'll keep you informed but it's weird it does it on QUIC connections and not UDP.
@ShreyasZare btw, it's Adguard's dns proxy https://github.com/AdguardTeam/dnsproxy configured to use technitium's QUIC upstream.
Probably unrelated, but what does ulimit -Sn
report in the container for you? Is it a number above 1024
? If it's not what is ulimit -Hn
?
Now it's working for more than 1 day so i'm still investigating ...
@polarathene ulimit -Sn = 1024 and ulimit -Hn = 1048576
ulimit -Sn
= 1024
This is fine ๐
If it is related to file descriptors, @ShreyasZare could raise the soft limit at runtime provided there is nothing that would call (EDIT: As shown below, this already appears to be done by select()
syscall on Linux (that is typically limited to 1024
, and causes failures beyond that).dotnet
like Go does)
It seems that was a bit naive of me, similar to Go dotnet is implicitly raising the soft limit to the hard limit on Linux: https://github.com/dotnet/runtime/pull/82429 (in early 2023 this was not the case)
# On my environment soft and hard limits for Docker containers presently default to 1048576
$ docker run --rm -itd--name dns --ulimit=nofile=1024:524288 technitium/dns-server
$ docker exec -it dns bash
# These are now correct from explicit `--ulimit`:
$ ulimit -Sn
1024
$ ulimit -Hn
524288
# The DNS service is running as PID 1, we can see soft limit for `NOFILE` was raised to hard limit:
# NOTE: Instead of prlimit command you can also use `cat /proc/1/limits`
$ prlimit -p 1
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space unlimited unlimited bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 524288 524288 files
NPROC max number of processes unlimited unlimited processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 62367 62367 signals
STACK max stack size 8388608 unlimited bytes
There is the System.Net.Sockets
module with Select()
method which calls the platform specific SocketPal
method:
Select()
syscall in the v8.0.10
releaseSelect()
syscall fallback in v9.0.0
(unreleased), which appears to be a bugfix for macOS (although that may only apply to quite old versions of OSX).
select()
syscall at the Interop/Unix/System.Native
level too (so not present in v8.0.10
).I don't see any direct calls to such from this project nor in the libmsquic
library. so unless that's being done elsewhere implicitly perhaps I'm off base here ๐
@chmichael when you've had it running for a reasonable amount of time under similar load conditions, you could probably try this in the container:
ls -1 /proc/1/fd | wc -l
That will show us how many FDs are open for PID 1 (which should be the DNS service in the Docker container).
Each connection would typically open an FD, but these should be cleaned up rather than accumulating. If the number isn't too large under the conditions that would cause the failure you're experiencing then this whole theory of mine can be disregarded ๐
You'd need to hit that 1048576
hard limit for file descriptors for this to trigger a crash, or if select()
syscall is used somewhere then FDs would need to reach 1024+.
@polarathene Thanks
I've upgraded to v13.1 let's see how it goes.
3 days without a problem (13.1) so i close this issue.
Thanks for the feedback.
Hello, I am getting CONNECTION_REFUSED after a day running only with QUIC protocol. If i switch to UDP there are no problems. Dropped connections are 0 while Server failures seems normal. Restarting DNS Server fixes the problem.
Here's the log of the my local dns proxy results:
Error Logs from Technitium: