draios / sysdig

Linux system exploration and troubleshooting tool with first class support for containers
http://www.sysdig.com/
Other
7.68k stars 728 forks source link

All chisels are broken: `attempt to index global 'sysdig' (a nil value)` #2063

Open dkogan opened 5 months ago

dkogan commented 5 months ago

Hi. I'm the Debian maintainer for sysdig. For the last few releases of sysdig chisels no longer work. Each one throws a lua error when trying to use the global sysdig variable. Something like this:

dima@shorty:~$ sysdig -r /tmp/test.scap -c udp_extract

udp_extract: error in init(): [string "--[[..."]:39: attempt to index global 'sysdig' (a nil value)

The build is mostly vanilla, using system dependencies instead of bundled ones. The current packaged release is of 0.35.0, but this issue isn't new in 0.35. Can I get some suggestions to debug this? Or yall can see this yourselves, but grabbing sysdig from a recent Debian or Ubuntu release.

Thanks.

jasondellaluce commented 5 months ago

@dkogan thanks for reporting. Did you find the minimum sysdig version in which this is reproduced?

jasondellaluce commented 5 months ago

cc @leogr

dkogan commented 5 months ago

Hi. I haven't tried bisecting this. It's such a fundamental problem, that I think it should be trivial to debug for those that have internal familiarity with the code. Let me know if you have trouble reproducing.

mrgian commented 5 months ago

We tested this with sysdig 0.35.0 but we weren't able to reproduce this Can you provide the OS version you've tested this? I'm looking forward to figure out the version of the dependencies you are building this with.

dkogan commented 5 months ago

Hi. The packages in Debian/sid and Debian/bookworm (the current stable release) are affected. Ubuntu/jammy is probably affected; I would have to check. The next Ubuntu release (24.04) will pull from whatever is in Debian/sid today, so it would be affected as well, if this isn't fixed.

What did you test, exactly? There is no "0.35.0" package.

I can also debug this myself, if you point me at things I should look at.

Thanks

jasondellaluce commented 5 months ago

Hi @dkogan. We have a 0.35.0 tag that is still under a release draft (not official), and we tested the same exact command you posted. We thought about using this git tag due to your sentence above The current packaged release is of 0.35.0, but this issue isn't new in 0.35, which made us think that you used it too. We haven't experienced the bug you posted, however our build was using the bundled dependencies and not the system one. So for now our suspect is that there should be some difference when using the system deps, as we could not find anything suspicious by looking at the code either (which has been barely touched in the past months).

dkogan commented 5 months ago

Hello!

We have a 0.35.0 tag that is still under a release draft (not official)

This tag exists in git. To any outside observer, it looks as official as any other. If this was intended to be a pre-release, can you call it "0.35.0rc1", or something?

We thought about using this git tag due to your sentence above The current packaged release is of 0.35.0, but this issue isn't new in 0.35, which made us think that you used it too. We haven't experienced the bug you posted, however our build was using the bundled dependencies and not the system one. So for now our suspect is that there should be some difference when using the system deps, as we could not find anything suspicious by looking at the code either (which has been barely touched in the past months).

Yeah. Right. I haven't tried it myself, but I was pretty sure that this isn't broken with whatever "normal" build you're normally doing. It's broken in the packages in Debian (and downstream repos, like Ubuntu) today.

I can debug this myself, if I can get some pointers about where to look.

Thanks.

mrgian commented 5 months ago

Hi @dkogan I've tried with a fresh install of Debian bookworm by pulling the package from the official repos, but unfortunately I'm still not able to reproduce this issue. Can you provide the versions of lua and liblua you are using?

Thank you

jasondellaluce commented 5 months ago

@dkogan I was able to reproduce the issue you raised when installing sysdig from a fresh debian:sid container, in which it comes with the 0.35.0+repack version. Let me respond point by point to provide context.

This tag exists in git. To any outside observer, it looks as official as any other. If this was intended to be a pre-release, can you call it "0.35.0rc1", or something?

Totally agree with this, it was a mistake on our end. We created the tag and kept the GitHub release on hold in order to wait for the Falco 0.37 release (the two are aligned in the version of falcosecurity/libs they depend to). We'll proceed in finishing the 0.35 sysdig release and we'll pay extra attention in the future, apologies.

I haven't tried it myself, but I was pretty sure that this isn't broken with whatever "normal" build you're normally doing.

It really works on our end, and there is a reason behind it. During the contribution process of the falcosecurity/libs to the CNCF, we prioritized making them as vendor agnostic as possible even at the codebase level. This affected the code in some ways, and the CMake setup as well. For example, the default name for the kernel drivers is now scap.ko but it's now possible to customize it through CMake variables so that each adopter can have degrees of freedom for their product or use cases.

The core implementation of chisels lives in falcosecurity/libs, and have been affected by these changes too. The sysdig.* functionalities are a way of accessing some core functions of the tool running the chisel, and given that the code for that resides in libsinsp (a component of the falcosecurity libs), the default name has been set as sinsp.* and we allow each client or adopter to customize it (https://github.com/falcosecurity/libs/blob/93a04bb92f85142edbfb47120cbaafb45d9ba5d8/userspace/libsinsp/CMakeLists.txt#L32). For the case of sysdig tool, this is still available as sysdig.* (we didn't want to introduce any breaking change) and the custom name is set through cmake (https://github.com/draios/sysdig/blob/ef4db873c1bb78c39cfdaa562121113c7cbcfc9d/cmake/modules/falcosecurity-libs.cmake#L76).

Now, I think the problem here is straightforward. For packages provides by us, this works as expected because we build both sysdig and the libraries together. In your case, I think this is broken because you build the libs as a standalone library and then link it to sysdig afterwards, thus picking up the sinsp.* default name. I can confirm that by substituting every sysdig.* with a sinsp.* in the udp_extract chisel of your example, everything keeps working as expected.

Next steps

The issue here is on our end, because your use case is completely legitimate, and rather more appropriate than what we do. We're now looking for a clean way of solving this, and will get back at you once done. We will for sure issue a new release of sysdig once done. For now, we'll proceed in publishing the GitHub release for 0.35.0 like mentioned above.

cc @leogr, @mrgian, @therealbobo

dkogan commented 5 months ago

Hi. Sorry about claiming bookworm was affected; I thought it was. And it's great that you could reproduce the problem.

About 0.35.0: the tag was made and there's a Debian package of this uploaded, so it has been "released", even if it wasn't finished. When you do finish it, can you call it "0.35.1"?

Thank you very much for debugging. In Debian today, falcosecurity/libs and sysdig are the only two components that touch these chisels. So for these packages, I can set CHISEL_TOOL_LIBRARY_NAME="sysdig" when building falcosecurity-libs. Is there any reason not to do that?

jasondellaluce commented 5 months ago

In Debian today, falcosecurity/libs and sysdig are the only two components that touch these chisels. So for these packages, I can set CHISEL_TOOL_LIBRARY_NAME="sysdig" when building falcosecurity-libs. Is there any reason not to do that?

If that's possible to do on your end, then I think this is the best workaround available for now! Let me know if you encounter problems with that.

On the long run, I think this approach will not be the best once other packages will start rely on the falcosecurity-libs, which for example could be the case of Falco somewhere in the future. On our side, we'll keep looking into a solution that would be more sustainable. Thanks!

About 0.35.0: the tag was made and there's a Debian package of this uploaded, so it has been "released", even if it wasn't finished. When you do finish it, can you call it "0.35.1"?

Yeah, what I meant is to remove the "draft" status from the GitHub release too, to communicate that the 0.35.0 release actually happened and keeping the same tag as the one of the Debian packages.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jasondellaluce commented 1 month ago

@dkogan giving you the update that since v0.36.1 (https://github.com/draios/sysdig/releases/tag/0.36.1), @therealbobo worked on moving/forking code for chisels from falcosecurity/libs to become an internal component of the sysdic codebase. This means that the issue reported above would be solved, starting from that version, without any extra workaround.