Open jonjensen opened 7 years ago
Can you share with us a capture that replicates the problem? You can create a capture with: sysdig -w trace.scap
@jonjensen I tried to reproduce this issue on 7.4.1708 (Core) CentOS distro running on EC2 and was not getting the chisel error. Could you provide the exact steps to reproduce this bug?
I'm getting that from time to time too .. I'll try to find an reproducer.
and just found one:
85283 04:40:24 crazy) grep -E /strace-.*.tar.xz
85283 04:40:26 crazy) sh -c source /usr/lib/frugalware/fwmakepkg;source ./FrugalBuild; echo -n $pkgname
spy_users chisel error: [string "--[[..."]:215: attempt to concatenate a nil value
However running these command won't make it return the same error.
Strange thing is after starting sysdig again I got an segfault.
85283 05:06:28 crazy) sed s/%2B/+/g;s/$//;s///;s/-/_/g
85283 05:06:35 crazy) find . -name strace
sysdig: /home/crazy/Work/Frugalware/current-testing/source/apps/sysdig/src/sysdig/userspace/libsinsp/parsers.cpp:1799: static void sinsp_parser::parse_openat_dir(sinsp_evt*, char*, int64_t, std::__cxx11::string*): Assertion `false' failed.
Abgebrochen (Speicherabzug geschrieben)
@luca3m @tjskrish I am still having this happen, but I can't tell you the steps to reproduce it. I just start sysdig -c spy_users
on our server and within hours or a day or so, I get the error. As before it was:
spy_users chisel error: [string "--[[..."]:215: attempt to concatenate a nil value
Our system (with newer versions than when I initially reported):
# rpm -q sysdig
sysdig-0.21.0-1.x86_64
# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
I saved a trace.scap
file like @luca3m requested, but it is 81 GB large! I'm guessing that will not be useful to you, but if you would like to look at it somehow, or have me filter or extract just part, I'm game. Please let me know. Thanks.
@luca3m @tjskrish
I can trigger that all the time with parts of our testuite.
It looks like evt.field(fargs) getting nil sometimes. I'm not an lua guy but I've added nil and 0 and '' checks .. and then set fargs to NA so it cannot be nil however the bug still occurs.
Also interesting is after changing print() to use format.string(..) the bug is gone here. At least I've run the testsuite 10 times without to trigger.
Does this make sense to anyone ?
Is this issue not fixed yet? I got the same error like every 10 min but in line 218
@webloft
This bug still exists.
Well, I don't want to debug this in deep - I just found out that the reported error line is just the last line of the statement.
The actual error (at least for me) is located at: process_tree[pid][2] which gets nil sometimes because something went wrong with the anchester tree loop I guess.
Capturing the nil like
if process_tree[pid][2] == nil then
process_tree[pid][2] = -1
end
right after
if not process_tree[pid] then
process_tree[pid] = {1 + process_tree[ppid][1], process_tree[ppid][2]}
end
works for me. Still better to log -1 than crashing every few minutes.
v0.27.1: the bug still exists. I reproduced this easily with:
Every minute I have like 3-4 processes starting, so maybe too much information to combine?
Linux mail.salvania.be 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
Description: Debian GNU/Linux 10 (buster)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Keep this open...
I got this error:
on this OS:
Using the sysdig-0.17.0-1.x86_64 RPM built by Sysdig Inc.
Running this command:
The last command shown was: