Closed luc-financeit closed 1 week ago
Hi @luc-financeit! Thanks for reporting this and sorry you've run into this issue.
You are absolutely correct, the crashtracker starts an external process that sends out crash report: if Ruby VM crashes, it is unable to send the report to us, so we need an additional process to send out these reports. I think there is really no other way of doing this.
Given that this is not critical for gem datadog or datadog-ci to work correctly, disabling crashtracker via DD_CRASHTRACKING_ENABLED=0
looks like a sensible thing to do. Would this workaround work for you?
Thanks for reporting @luc-financeit ! And to @anmarchenko for a quick response and work around.
I'm sharing this with our team, see if we can diagnose/plan on what to do about this. We will share updates here as we have them.
👋 @luc-financeit Thanks for reporting.
I can reproduce it with
require "bundler/inline"
gemfile { gem "datadog", "2.3.0", source: "https://rubygems.org" }
require "datadog"
Datadog.configure {}
Process.waitall
Indeed, Process.waitall
is hanging when crashtracking is enabled. In addition, I have found that this behaviour does not applied to other of methods from Process
(.wait
, wait2
, waitpid
and waitpid2
) when provided with specific pid
.
For example:
pid = Process.fork do
puts Process.pid
end
Process.wait(pid)
We will be looking into this issue and I suggest circumvent this by setting DD_CRASHTRACKING_ENABLED=false
temporarily in your test suite.
waitall
waits for all child processes, and will attempt to wait for child processes spawned by crash tracker.
It is convenient to use but does make the assumption that there aren't any processes spawned in the background by dependent libraries or the core runtime. A more robust solution is to keep track of pids for processes spawned by the application and wait for them explicitly using one of the other wait* methods.
Sure, I agree that our code doing waitall
is questionable, for several reasons. But this was still frustrating to debug, given that there is, as far as I can tell, zero documentation about the crash tracker — even the env var to turn it off isn't documented in the additional configuration section? (I should also note that turning it off via code-based configuration didn't seem to work for us — if I had to guess, probably similar load-order reasons as startup logs.)
"Don't use waitall
" or "turn off the crash tracker if you do" are both reasonable solutions, but perhaps that tradeoff can be documented somewhere?
Hey @luc-financeit : we're going to disable this crash tracker feature by default until we can test and deploy a suitable alternative. Look for this in our next release. Sorry for the frustration and thanks for your patience!
We've been able to upgrade successfully. Thanks all for your help!
Current behaviour
Upgrading the
datadog
gem from 2.2.0 to 2.3.0 causes a spec in our test suite to hang. The code being tested does something likeLooking at
ps
output, there's alibdatadog-crashtracking-receiver
child process which ourwaitall
is hung up on — manually killing the crashtracker process, or disabling the crashtracker entirely (viaDD_CRASHTRACKING_ENABLED
) allows the test to complete successfully.Expected behaviour
No hang.
Steps to reproduce
I'm afraid I don't have a stand-alone reproduction — the actual code is significantly more complicated, and I haven't tried slimming it down to see if there's additional triggers. I can dig in to it more if the above isn't enough to go on.
Environment
Datadog.configure ...
): See gist.