alexdalitz / dnsruby

Dnsruby is a feature-complete DNS(SEC) client for Ruby, as used by many of the world's largest DNS registries and the OpenDNSSEC project
Other
197 stars 77 forks source link

dnsruby freezes (very rarely) #152

Closed fschwahn closed 4 years ago

fschwahn commented 5 years ago

We have had this happen two times in a year, for a task that runs daily, so it is very rare.

We have the following script (I abbreviated it a bit):

resolver = Dnsruby::Resolver.new(nameserver: ["8.8.8.8", "8.8.4.4"])
resolver.retry_times = 3
record_type = "MX"

domains.each do |domain|
  begin
    response = resolver.query(domain, record_type)
    if response.header.ancount == 0
      puts "Missing #{record_type} record for domain \"#{domain}\""
    end
  rescue Dnsruby::NXDomain
    puts "Domain \"#{domain}\" does not exist"
  rescue Dnsruby::ResolvTimeout, Dnsruby::ServFail, Dnsruby::ResolvError, Dnsruby::OtherResolvError
    puts "Error trying to resolve domain \"#{domain}\""
  end
end

The job was killed after 24h, with the following stack trace:

File "/app/vendor/bundle/ruby/2.5.0/gems/dnsruby-1.61.2/lib/dnsruby/resolver.rb", line 253, in pop
File "/app/vendor/bundle/ruby/2.5.0/gems/dnsruby-1.61.2/lib/dnsruby/resolver.rb", line 253, in send_message
File "/app/vendor/bundle/ruby/2.5.0/gems/dnsruby-1.61.2/lib/dnsruby/resolver.rb", line 203, in query

This does not run in a multi-threaded environment, but as a dedicated process which runs this task.

alexdalitz commented 5 years ago

Hmmm.... This one is going to be very tricky to reproduce, and the stack trace doesn't really give me anything to go on. It sounds like something is getting locked up in the select thread. Can you provide any more details at all on this please?

fschwahn commented 5 years ago

Unfortunately there's not more to it. I added how we initialize the resolver to the code snippet, but I doubt it'll help much. The script ran for 18 minutes before it froze. It was run on ruby 2.5.5, and this task is run as a rake task. We start a dedicated heroku dyno for this task (ie. it is a completely isolated process).

alexdalitz commented 5 years ago

Do you know which domain it crashed on?

Is each domain resolved sequentially, in a single thread?

Thanks!

On 7 Aug 2019, at 15:08, Fabian Schwahn notifications@github.com wrote:

Unfortunately there's not more to it. I added how we initialize the resolver to the code snippet, but I doubt it'll help much. The script ran for 18 minutes before it froze. It was run on ruby 2.5.5, and this task is run as a rake task. We start a dedicated heroku dyno for this task (ie. it is a completely isolated process).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWUDFJDMAVBYJSEXTXDQDLJNJA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3YQZJA#issuecomment-519113892, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWXGVUCLB7BSJ7MFSG3QDLJNJANCNFSM4IJUZUKQ.

fschwahn commented 5 years ago

Do you know which domain it crashed on?

Unfortunately not

Is each domain resolved sequentially, in a single thread?

Yes

alexdalitz commented 5 years ago

Did you by any chance many to store the log?

If not - is it please possible to set the log level to DEBUG and capture the end of the log (last few thousand lines) in the case of a failure? If I had the log, I could probably fix this - without it, it’s going to be very hard….

Thanks!

On 8 Aug 2019, at 15:33, Fabian Schwahn notifications@github.com wrote:

Do you know which domain it crashed on?

Unfortunately not

Is each domain resolved sequentially, in a single thread?

Yes

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWVAQBICP7EE4BC6LADQDQVEXA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD332AWQ#issuecomment-519544922, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUH3KAMU4BVEUVNHPTQDQVEXANCNFSM4IJUZUKQ.

fschwahn commented 5 years ago

Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.

alexdalitz commented 5 years ago

Thank you!

On 8 Aug 2019, at 17:10, Fabian Schwahn notifications@github.com wrote:

Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWW2OADGXYTWVCEHVGTQDRAQZA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34DZRI#issuecomment-519584965, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUZO5MOO32WQ3VVUKTQDRAQZANCNFSM4IJUZUKQ.

alexdalitz commented 5 years ago

Are you using TCP or UDP, please?

On 8 Aug 2019, at 17:14, Alex Dalitz alex@caerkettontech.com wrote:

Thank you!

On 8 Aug 2019, at 17:10, Fabian Schwahn <notifications@github.com mailto:notifications@github.com> wrote:

Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWW2OADGXYTWVCEHVGTQDRAQZA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34DZRI#issuecomment-519584965, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUZO5MOO32WQ3VVUKTQDRAQZANCNFSM4IJUZUKQ.

fschwahn commented 5 years ago

Are you using TCP or UDP, please?

I don't know to be honest, but the code above includes all the interaction we have with Dnsruby - initializing a resolver, setting retry_times, and then querying. So we never explicitly configure anything with TCP / UDP.

alexdalitz commented 4 years ago

@fschwahn - "Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months."

Can I assume that this is no longer a problem, please?

Or do you now have some trace I can work with?

fschwahn commented 4 years ago

In the end we switched to Resolv::DNS because it is part of the standard library, and the functionality it provided is enough for our use case. I'll close this.