Closed fschwahn closed 4 years ago
Hmmm.... This one is going to be very tricky to reproduce, and the stack trace doesn't really give me anything to go on. It sounds like something is getting locked up in the select thread. Can you provide any more details at all on this please?
Unfortunately there's not more to it. I added how we initialize the resolver to the code snippet, but I doubt it'll help much. The script ran for 18 minutes before it froze. It was run on ruby 2.5.5, and this task is run as a rake task. We start a dedicated heroku dyno for this task (ie. it is a completely isolated process).
Do you know which domain it crashed on?
Is each domain resolved sequentially, in a single thread?
Thanks!
On 7 Aug 2019, at 15:08, Fabian Schwahn notifications@github.com wrote:
Unfortunately there's not more to it. I added how we initialize the resolver to the code snippet, but I doubt it'll help much. The script ran for 18 minutes before it froze. It was run on ruby 2.5.5, and this task is run as a rake task. We start a dedicated heroku dyno for this task (ie. it is a completely isolated process).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWUDFJDMAVBYJSEXTXDQDLJNJA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3YQZJA#issuecomment-519113892, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWXGVUCLB7BSJ7MFSG3QDLJNJANCNFSM4IJUZUKQ.
Do you know which domain it crashed on?
Unfortunately not
Is each domain resolved sequentially, in a single thread?
Yes
Did you by any chance many to store the log?
If not - is it please possible to set the log level to DEBUG and capture the end of the log (last few thousand lines) in the case of a failure? If I had the log, I could probably fix this - without it, it’s going to be very hard….
Thanks!
On 8 Aug 2019, at 15:33, Fabian Schwahn notifications@github.com wrote:
Do you know which domain it crashed on?
Unfortunately not
Is each domain resolved sequentially, in a single thread?
Yes
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWVAQBICP7EE4BC6LADQDQVEXA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD332AWQ#issuecomment-519544922, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUH3KAMU4BVEUVNHPTQDQVEXANCNFSM4IJUZUKQ.
Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.
Thank you!
On 8 Aug 2019, at 17:10, Fabian Schwahn notifications@github.com wrote:
Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWW2OADGXYTWVCEHVGTQDRAQZA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34DZRI#issuecomment-519584965, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUZO5MOO32WQ3VVUKTQDRAQZANCNFSM4IJUZUKQ.
Are you using TCP or UDP, please?
On 8 Aug 2019, at 17:14, Alex Dalitz alex@caerkettontech.com wrote:
Thank you!
On 8 Aug 2019, at 17:10, Fabian Schwahn <notifications@github.com mailto:notifications@github.com> wrote:
Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdalitz/dnsruby/issues/152?email_source=notifications&email_token=AB2WFWW2OADGXYTWVCEHVGTQDRAQZA5CNFSM4IJUZUK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34DZRI#issuecomment-519584965, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WFWUZO5MOO32WQ3VVUKTQDRAQZANCNFSM4IJUZUKQ.
Are you using TCP or UDP, please?
I don't know to be honest, but the code above includes all the interaction we have with Dnsruby - initializing a resolver, setting retry_times,
and then querying. So we never explicitly configure anything with TCP / UDP.
@fschwahn - "Ok, I added code for this, but as I said: this is very rare. So don't be surprised if you don't hear anything from me for a few months."
Can I assume that this is no longer a problem, please?
Or do you now have some trace I can work with?
In the end we switched to Resolv::DNS
because it is part of the standard library, and the functionality it provided is enough for our use case. I'll close this.
We have had this happen two times in a year, for a task that runs daily, so it is very rare.
We have the following script (I abbreviated it a bit):
The job was killed after 24h, with the following stack trace:
This does not run in a multi-threaded environment, but as a dedicated process which runs this task.