Closed hedss closed 6 years ago
What is the problem that is trying to be solved here? Is there a testcase that reproduces the problem?
There have been noticeable timeouts on the Supervisor (see https://github.com/resin-io/resin-api/pull/589#issuecomment-335634114), and Pablo's interested in shortening the DNS timeouts.
The other way to go with this is to alter the resolver config (shorten timeouts and attempts).
CCing in @pcarranzav for more information, and @petrosagg.
@pcarranzav @petrosagg @hedss What is the status of this? Do we have a conclusion on this? Do we want to shorten the timeout definition?
Can we have a test case in order to be able to reproduce this issue?
This actually was a symptom of the synchronicity of DNS in libuv (see the previously mentioned issue in the comment above). The hangs in the Supervisor due to this were actually fixed by https://github.com/resin-io/resin-supervisor/pull/500 which tunneled Mixpanel requests through the API. This therefore can be closed as it's no longer relevant.
As part of the work for OnPrem, it was noticed in situations where a domain is not resolvable, operations could timeout due to blocking. See this thread https://www.flowdock.com/app/rulemotion/r-supervisor/threads/yW_H1qeMsuuXd0y2zsM-7gliLQ2 for more information. The solution there is to alter the resolver options so that the client drops the request after a particular length of time and number of attempts.
It's entirely possible that we will suffer similar issues on resinOS via DNSMasq, including within the Supervisor.
DNSMasq itself does not have user configurable option for the timeout length of a query, but it is set in
src/config.h
as the following definitions:DNSMasq drops queries after 4x the
TIMEOUT
value, which is hardcoded in theget_new_frec()
function insrc/foward.c
.Therefore by default, any lookup that does not resolve takes a full 40 seconds before it's dropped should the upstream not resolve it. This is not ideal.
A sensible value for
TIMEOUT
needs to be configured.