configuration: improve handling of positional arguments

perillo commented 5 months ago

Currently getopt_long is configured to support mixing options and positional arguments, but kcov don't use the special "--" argument to tell getopt_long when to stop parsing the executable arguments. Instead the implementation tries to scan the arguments to find an executable, but the implementation is flawed because there are two required positional arguments, and both may be executable.

In order to handle positional arguments correctly, configure getopt_long to stop parsing after encountering a positional argument.

This change greatly simplified the code, since the pre scanning can be removed. However it may break the workflow of some users.

Add a regression test for issue #414

NOTES

The current implementation does not break the cli interface, but some user workflow may break. In this case an alternative implementation is to configure getopt_long to report the positional arguments, too (using "-" instead of "+" as the optstring prefix).

Also note that the current configuration of getopt is the original UNIX behavior.

ISSUES

This PR took longer than expected because I started to fix some issues with the test suite.

Additionally, after a system update (on Arch Linux) I started to see more failures. These failures can be reproduced on the original implementation, commit 895f21d0.

Another issue is the regression test that I added. The test fails with:

rv: 255

stderr
Can't set me as ptraced: Operation not permitted
kcov: error: Can't start/attach to /usr/bin/python3.11
Child hasn't stopped: ff00
kcov: error: Can't start/attach to /usr/bin/python3.11

The command works correctly when running it maually, so I assume there is an issue in the test suite environment.

P.S.

Issue #414 has been closed incorrectly. This PR is responsible to close it.

SimonKagstrom commented 5 months ago

Thanks a lot, looking good!

As for your arch issue, could it be some Linux security change that was made? Before, the yama scope was one thing that affected kcov (and gdb), see https://www.kernel.org/doc/Documentation/security/Yama.txt, although that typically was only for attachment.

Here, you're tracing a system binary (python), and maybe there are some new restrictions as to what can be done with those? Sort of like what MacOs does, to prevent people debugging the app store etc...

SimonKagstrom commented 5 months ago

Actually, the new test fails in the CI run (not sure why it wasn't run for your PR though?).

And I also actually saw an issue with the path resolving with the new master. First before the change:

HEAD is now at 692d41c Merge pull request #422 from SimonKagstrom/fix-freebsd-python2-tests
Simons-iMac-7:kcov ska$ rm -rf /tmp/kcov && kcov /tmp/kcov kcov
Usage: kcov [OPTIONS] out-dir in-file [args...]

Where [OPTIONS] are
[...]
Simons-iMac-7:kcov ska$ git checkout master
Previous HEAD position was 692d41c Merge pull request #422 from SimonKagstrom/fix-freebsd-python2-tests
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

# (Was built in another terminal window before this line)
Simons-iMac-7:kcov ska$ rm -rf /tmp/kcov && kcov /tmp/kcov kcov
kcov: error: posix_spawn
kcov: error: Can't start/attach to /Users/ska/projects/build/kcov/src/kcov

When I manually pass the full path to the binary to trace, it works, but not the path lookup (sort of what you discussed in the other bug report, but no segv).

perillo commented 5 months ago

I received an email with the workflow failure, having configured github to run workflows in my fork. Not sure why the workflow was not run in the upstream repository; maybe a bug in github?

The link is https://github.com/perillo/kcov/actions/runs/8301507902.

As for security, I also tried setting kernel.yama.ptrace_scope to 0 a few days ago, but there was still an error.

Did you remember if with the current master there are no errors? You can try to add the workflow_dispatch event (https://docs.github.com/en/actions/using-workflows/manually-running-a-workflow). It is useful for me, since I cat test my PR from my fork.

It master has no errors, IMHO it is better to revert this PR.

perillo commented 5 months ago

The issue with the executable being a system executable (installed in PATH) can be reproduced with a safe commit:

> git checkout 895f21d0
Previous HEAD position was 692d41c Merge pull request #422 from SimonKagstrom/fix-freebsd-python2-tests
HEAD is now at 895f21d ci: Fix warning about node12 / node16 (actions/checkout@v3)
> build/bin/kcov /tmp/kcov zig4-dev
^Cfish: Job 1, 'build/bin/kcov /tmp/kcov zig4-d…' terminated by signal SIGKILL (Forced quit)

I used an executable installed in ~/.local/bin. It started to using all the cpu, running for a long time (possibly forever).

It seems the binary crashes only if I use kcov, but this should be the same issue (just that kcov crashes)

perillo commented 5 months ago

First, I checkout at 692d41c, but I can not reproduce your output.

Instead:

> build/bin/kcov /tmp/kcov zig4-dev
^Cfish: Job 1, 'build/bin/kcov /tmp/kcov zig4-d…' terminated by signal SIGKILL (Forced quit)

zig4-dev is installed in PATH (in ~/.local/bin). The command runs using 100 of cpu, probably forever? It seems the process crashes only when using kcov.

In a previous run there were also kcov-system-daemons using CPU, but this may be caused by the daemon still alive after a test.

I can confirm this issue with a commit before my changes.

UPDATE: the bug was caused by the use of a Zig executable.

perillo commented 5 months ago

Sorry for the previous messages; in the first case I was not sure I rebuild kcov and in the second case I used a Zig executable.

I checkout at 692d41c, but I can not reproduce your output. Instead, running build/bin/kcov /tmp/kcov test-bin works correctly.

As for the bug in master (assuming master after this PR was merged), on my system it works correctly.

SimonKagstrom commented 5 months ago

The master bug is MacOS-specific, this fixes it:

diff --git a/src/engines/mach-engine.cc b/src/engines/mach-engine.cc
index 05bb434..8d2d7a9 100644
--- a/src/engines/mach-engine.cc
+++ b/src/engines/mach-engine.cc
@@ -257,7 +257,7 @@ private:
         // Fork the process, in suspended mode
         auto& conf = IConfiguration::getInstance();
         auto argv = conf.getArgv();
-        rv = posix_spawn(&m_pid, argv[0], nullptr, &attr, (char* const*)argv, environ);
+        rv = posix_spawn(&m_pid, executable.c_str(), nullptr, &attr, (char* const*)argv, environ);
         if (rv != 0)
         {
             error("posix_spawn");

Basically it ignored the argument passed to it (incorrectly), which no longer works after your fixes. I'll correct that, and it also explains why I saw it and not you.

Not sure about the test though. After the fix above, it also works on my Mac, so I'm not sure what's going on with the CI runs.

perillo commented 5 months ago

This change also removed calling get_real_path for the binary-path configuration value; instead I called get_real_path when binaryPath is used when hashing the directory name.

This may change the hashed value and may cause conflicts, but I'm not sure, since this case is not tested.

This patch should restore the original behavior:

diff --git a/src/configuration.cc b/src/configuration.cc
index a778d72..a418d3c 100644
--- a/src/configuration.cc
+++ b/src/configuration.cc
@@ -482,7 +482,9 @@ public:
            std::string binaryName;
            std::string binaryPath = path;

-           (void)look_path(path, &binaryPath);
+           if (look_path(path, &binaryPath) < 0)
+               binaryPath = get_real_path(path);
+
            std::pair<std::string, std::string> tmp = split_path(binaryPath);

            setKey("binary-path", tmp.first);
@@ -492,7 +494,7 @@ public:
            {
                setKey("target-directory",
                        fmt("%s/%s.%08zx", outDirectory.c_str(), binaryName.c_str(),
-                               (size_t)hash_file(get_real_path(binaryPath))));
+                               (size_t)hash_file(binaryPath)));
            }
            else
            {

It calls get_real_path when the executable is not in PATH and may exist in the current directory.

perillo commented 5 months ago

Actually, the new test fails in the CI run (not sure why it wasn't run for your PR though?).

@SimonKagstrom I have fixed the test failing in #428.

SimonKagstrom / kcov