SimonKagstrom / kcov

Code coverage tool for compiled programs, Python and Bash which uses debugging information to collect and report data without special compilation options
http://simonkagstrom.github.io/kcov/
GNU General Public License v2.0
709 stars 109 forks source link

configuration: fix incorrect handling of out-dir argument #415

Closed perillo closed 5 months ago

perillo commented 5 months ago

Currently the argument parsing code expands all the arguments.

This means that an out-dir argument having the same name of an executable is fully expanded, thus becoming an executable instead of a directory.

Examples:

> kcov python echo
kcov: error: Can't find or open echo
> kcov python /bin/echo
kcov: error: Can't open directory /usr/bin/python3.11/

Remove checking if the first argument is an elf file, since the executable must be after the out-dir argument.

Skip options and don't expand arguments, unless lstat fails.

Issue #414: kcov: error when out-dir is an executable

TODO

Add tests

perillo commented 5 months ago

The code in this PR does not fully solve the problem. When the out-dir directory does not exists, the code behave incorrectly.

SimonKagstrom commented 5 months ago

Thanks for your PR!

One of the tests fail though: https://github.com/SimonKagstrom/kcov/actions/runs/8175261352/job/22356631660?pr=415, which was related to this issue: https://github.com/SimonKagstrom/kcov/issues/368

perillo commented 5 months ago

Thanks for your PR!

One of the tests fail though: https://github.com/SimonKagstrom/kcov/actions/runs/8175261352/job/22356631660?pr=415, which was related to this issue: #368

Yes, it was my fault sorry. I tried to skip all options, but that was clearly incorrect.

Still, as I wrote the previous comment, the current change does not fully solve the bug.

I would like to run tests locally, but how do you run tests?

Thanks.

UPDATE

The problem does not seem to be related to my code to skip all the option when searching for an elf executable. Not sure what's causing the bug.

SimonKagstrom commented 5 months ago

Unfortunately, it's somewhat convoluted.

Perhaps the easiest is to look at how the ci does it, which is in .github/workflows/ci.yml.

You need to build some test binaries (i.e., cmake against the tests/ directory) and then run tests/tools/run-tests to excecute them. The .github/workflows/ci-run-tests.sh does that, but tests/tools/run-tests will print the arguments it expects.

SimonKagstrom commented 5 months ago

... and also look in .github/workflows/generic-build.sh, for the cmake + make command to bind the test binarieis.

perillo commented 5 months ago

@SimonKagstrom what do you think about following the standard way to mark extra arguments? As an example:

kcov --clean coverage tests/python/link_main -- 5 --foo

instead of:

kcov --clean coverage tests/python/link_main 5 --foo

I also tried to remove completely the code scanning for an elf executable, and it seems to still work, but I need to run the tests.

SimonKagstrom commented 5 months ago

Well, I kind of like the current method, since it allows this pattern:

tests/unit-tests/ut --test-case="..."                  # First run without kcov
kcov /tmp/kcov tests/unit-tests/ut --test-case="..."   # Then arrow up and just add kcov at the beginnig

anyway, isn't the problem really that all paths are expanded with $PATH? Only the binary should be processed that way, and perhaps that can be done after the loop?

perillo commented 5 months ago

I found an additional issue when running kcov from source code:

> build/bin/kcov build/data tests/python/link_main  5 --foo
build/bin/kcov: unrecognized option '--foo'

kcov: error: Unrecognized option: -

Usage: kcov [OPTIONS] out-dir in-file [args...]

Note that this is from master, without changes.

This does not happen when using kcov installed on the system. Instead, when using the system kcov I got a deprecation warning:

/home/manlio/src/contrib/kcov/github/perillo/kcov/build/data/python-helper.py:7: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses

UPDATE: I simply forgot to build the code again, so I was executing the code from this PR!

perillo commented 5 months ago

Well, I kind of like the current method, since it allows this pattern:

tests/unit-tests/ut --test-case="..."                  # First run without kcov
kcov /tmp/kcov tests/unit-tests/ut --test-case="..."   # Then arrow up and just add kcov at the beginnig

anyway, isn't the problem really that all paths are expanded with $PATH? Only the binary should be processed that way, and perhaps that can be done after the loop?

The purpose of the scan is to find the argv index of the executable, so that getopt_long can stop processing arguments after the executable.

Unfortunately I think this is an impossible task, since you can never tell what parameter is an executable, when they are relative names instead of full paths.

SimonKagstrom commented 5 months ago

Yeah, sorry, I haven't touched that code in a long time, so I had forgotten quite how it worked...

Anyway, shouldn't it work to do:

  1. Skip through all options (i.e., starting with -)
  2. The next argument is the output directory, which should not be looked up in PATH
  3. After that is the binary, and all else are arguments to the binary

The path in 3 should be looked up in $PATH, if not found.

Your patch seems to pretty much do this, but maybe I'm missing something.

perillo commented 5 months ago

Yeah, sorry, I haven't touched that code in a long time, so I had forgotten quite how it worked...

Anyway, shouldn't it work to do:

1. Skip through all options (i.e., starting with -)

2. The next argument is the output directory, which should not be looked up in PATH

3. After that is the binary, and all else are arguments to the binary

The path in 3 should be looked up in $PATH, if not found.

Your patch seems to pretty much do this, but maybe I'm missing something.

Coincidentally I also found a possible solution:

  1. Expand all positional arguments, but stop only when the second argument is found

Unfortunately there is still a small window for a bug:

kcov --include-path python

This is possible because GNU getopt_long parses both --name=arg and --name arg.

Additionally I would like to improve the current code, by adding two additional functions in utils.cc:

These change should make the code more readable.

I will try to implement this, testing the code locally before push the code.

Let me know if you don't like adding the two new functions.

SimonKagstrom commented 5 months ago

Sounds very good, thanks!

The only comment I have is that is_elf is a bit Linux/FreeBSD:ish. I mainly run on MacOS now, which uses Mach-O. Now, is_macho would be a fun name, but I think is_binary_executable is more generic and probably better.

perillo commented 5 months ago

@SimonKagstrom I compared how kcov behaves between this change and the old code.

The example is:

build/bin/kcov --python-parser=python2 cover tests/python/main 5 --foo

With this change I get:

NOT FOUND: 6, (null)
build/bin/kcov: unrecognized option '--foo'
kcov: error: Unrecognized option: -

With the old master:

NOT FOUND: 1, --python-parser=python2
NOT FOUND: 2, cover
hello!
...

So the old code inner for loop (over PATH) was the reason why the code actually worked! This was not documented.

This can be probably fixed in two way:

  1. Use -- after the executable (this is the correct way)
  2. Make IParserManager.matchParse also parse python and bash scripts I assumed this was true, so I named the function is_supported_executable.
  3. Add an inner loop, but I have no idea how many iterations are needed. Some testing is necessary, but I feels this totally unreadable.
SimonKagstrom commented 5 months ago
  1. Use -- after the executable (this is the correct way)
  2. Make IParserManager.matchParse also parse python and bash scripts I assumed this was true, so I named the function is_supported_executable.

It should already match python and bash scripts. It calls matchParser in e.g., python-engine.cc, which looks for python in the shebang at the start of the file, so I don't quite understand why it doesn't work

SimonKagstrom commented 5 months ago

I wonder if it's just something like this that's needed:

--- a/src/utils.cc
+++ b/src/utils.cc
@@ -778,6 +778,13 @@ uint32_t hash_file(const std::string &filename)

 int look_path(const std::string &file, std::string *out)
 {
+       struct stat st;
+       if (file_exists(file))
+       {
+               *out = get_real_path(file);
+               return 0;
+       }
+
        const char *sys_path = getenv("PATH");
        if (!sys_path)
                return -1;
@@ -788,7 +795,6 @@ int look_path(const std::string &file, std::string *out)
        {
                const std::string root = *it;
                const std::string path = get_real_path(root + "/" + file);
-               struct stat st;

                if (lstat(path.c_str(), &st) < 0)
                        continue;

I.e., accept the file if it's either already given in a relative or absolute path?

perillo commented 5 months ago

I wonder if it's just something like this that's needed: ... I.e., accept the file if it's either already given in a relative or absolute path?

Yes, I have reached the same conclusion. The problem was that tests/python/main is always incorrectly expanded. It has nothing to do with parser matching.

As an example, Go os.exec package tries to resolve a file name with slash without consulting PATH. In an older version it could return a path relative to the current directory (this is probably what we want).

Currently I'm trying to split look_path into two functions: find_executable and look_path (that calls find_executable).

perillo commented 5 months ago

The code seems to work but I got two errors:

======================================================================
FAIL: runTest (bash_linux_only.bash_sh_shebang.runTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/manlio/src/contrib/kcov/github/perillo/kcov/tests/tools/bash_linux_only.py", line 12, in runTest
    assert parse_cobertura.hitsPerLine(dom, "sh-shebang.sh", 4) == 1
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

======================================================================
FAIL: runTest (python.python_issue_368_can_handle_symlink_target.runTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/manlio/src/contrib/kcov/github/perillo/kcov/tests/tools/python.py", line 39, in runTest
    assert b"unrecognized option '--foo'" not in o
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
SimonKagstrom commented 5 months ago

Funnily enough, the python test works fine here (MacOS, but the python stuff should be exactly the same):

runTest (python.python_can_set_legal_parser.runTest) ... ok
runTest (python.python_exit_status.runTest) ... ok
runTest (python.python_issue_368_can_handle_symlink_target.runTest) ... ok
runTest (python.python_tricky_single_dict_assignment.runTest) ... expected failure
runTest (python.python_tricky_single_line_string_assignment.runTest) ... ok
runTest (python.python_unittest.runTest) ... ok

So since it's an improvement, I'd suggest to add that to the PR.

Edit: The bash failure is Linux-specific, so wasn't run on my machine (so all tests ran fine here).

perillo commented 5 months ago

My handle-out-dir-correctly branch is in a diverged state (due to me messing up when trying to replace the last commit). I will try to solve the problem without opening a new PR.

SimonKagstrom commented 5 months ago

@perillo is this PR still relevant, or was it solved by the other?

perillo commented 5 months ago

On Tue, Mar 12, 2024, 07:31 Simon Kagstrom @.***> wrote:

@perillo https://github.com/perillo is this PR still relevant, or was it solved by the other?

— Reply to this email directly, view it on GitHub https://github.com/SimonKagstrom/kcov/pull/415#issuecomment-1990843749, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPN3AEPW2QU2BSGU6GLXITYX2ONRAVCNFSM6AAAAABEJLJ4HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQHA2DGNZUHE . You are receiving this because you were mentioned.

It is still relevant.

Message ID: @.***>

perillo commented 5 months ago

I have a change that should finally fix the problem, but I think that reusing this PR is not a good idea.

SimonKagstrom commented 5 months ago

OK, so feel free to close this and open a new one when you feel ready! Thanks!

perillo commented 5 months ago

Replaced by #426.