cython / cython

The most widely used Python to C compiler
https://cython.org
Apache License 2.0
9.53k stars 1.49k forks source link

[BUG] line_profiler fails to find original source files because of the relative file paths #3929

Open EvgenKo423 opened 3 years ago

EvgenKo423 commented 3 years ago

Describe the bug: If you build a Cython extension with line tracing support and try to profile it with line_profiler/kernprof, they won't be able to display a source code because an extension contains relative paths to it (introduced in #1565/#1576).

To reproduce:

path\to\module_root> cython -X linetrace=True -X binding=True "path\to\your_module.pyx"
path\to\module_root> pip install -U .
path\to\module_root\debugging_tools> kernprof -l -o nul -v module_runner.py args

Expected behavior: line_profiler/kernprof display extension source code along profiling results.

Environment:

Additional notes: As a solution you could still use absolute paths for profiling builds and relative otherwise. However, I don't see much point in using relative paths by now in a first place (except for some privacy or code size concerns) as I can't recompile an extension with the same checksum even from the same .c file...

tillahoffmann commented 2 years ago

Is this the same underlying issue as #2543?

da-woods commented 2 years ago

Is this the same underlying issue as #2543?

Yes it definitely looks like it.

I'm not quite sure what the main issue is here: does line_profiler not understand relative paths at all, are the paths relative to the wrong place, or something else?

tillahoffmann commented 2 years ago

The main issue is probably that line_profiler retrieves the source file from co_filename here to try and display the source. So it won't be able to find the source unless line_profiler is run from the directory from which cythonize was run because cythonize strips the current working directory if possible (see here).

The behavior is somewhat unexpected, because co_filename is usually an absolute path. But maybe asking users to run line_profiler from the same directory isn't unreasonable because it's likely to only be used during development anyway.

da-woods commented 2 years ago

The solution proposed in the issue you linked would be to use the __file__ attribute of the module to work out the full path of the source file at runtime. That'd rely on the source file being distributed alongside the .so file.

It seems like it might be more fiddly than it sounds to make work reliably.

I'd be reluctant to code the full path in at compile time (as the original post suggests) for a few different reasons.

tillahoffmann commented 2 years ago

An option might be to add a line to the profiling_tutorial to note that, if you're using line_profiler, you must run it from the directory where you ran cythonize.

silverzhaojr commented 2 years ago

As a solution you could still use absolute paths for profiling builds and relative otherwise.

Could you please show how to build .so file with absolube path info included? I'm trying to use coverage.py for cython modules but it cannot find the source when reporting.

EvgenKo423 commented 2 years ago

@silverzhaojr Well, I actually meant to implement this as a bug fix, but you indeed can build it with absolute paths yourself: just run the cython command from some directory which is sibling to your extensions, like so:

path\to\module_root\debugging_tools> cython -X linetrace=True -X binding=True "..\path\to\your_module.pyx"
silverzhaojr commented 2 years ago

Do you mean running the cython command in a different directory from where the .pyx file locates in? Normally we uses setup.py file to build the .so file and usually it's in the same directory as the .pyx file.

Actually I think maybe it's better to add a new switch to control whether to include the absolute path in .so file when compling .pyx files. Another way can be always including absolute path when linetrace is enabled.

In our case, we cythonize all python code as .so files, and then pack them into one executable file with pyinstaller for code protection. We want to use coverage.py for coverage report, but in such case all files are using relative path, so it always reports error "No source for code: xxx". If the absolute path is included in the .so file, the coverage report can always work no matter the .so files are moved somewhere else, or they're packed by pyinstaller.

EvgenKo423 commented 2 years ago

@silverzhaojr Yes, not just different directory, but the one that is not part of an absolute extension path.

Absolute paths are included into .c files at the cythonization stage (which I did with cython command). If you specify .pyx files directly in your setup.py, you may try to do the following:

  1. Call the os.chdir() function;
  2. Specify extensions' paths relative to a new dir and call the cythonize() function;
  3. Change the working directory back to where the setup.py is before calling the setup() function.

Another way can be always including absolute path when linetrace is enabled.

That's exactly what I meant.

silverzhaojr commented 2 years ago

@EvgenKo423

Thanks for you info! I tried your suggestion and it works:

extensions = [Extension('primes', ['primes.pyx'])]
compiler_directives={'language_level' : sys.version_info[0]}

enable_coverage = True
if enable_coverage:
    for ex in extensions:
        # use absolute path for source files
        ex.sources = [os.path.abspath(source) for source in ex.sources]
        ex.define_macros.append(('CYTHON_TRACE', '1'))

    compiler_directives.update({'linetrace': True})

    # This is a workround to include absolute path of source file in the
    # built .so file:
    #
    # If the source file path starts with the working directory, the path
    # included in .c file will be truncated and become the relative path when
    # calling `cythonize()`.
    #
    # See https://github.com/cython/cython/issues/3929 for details
    old_wd = os.getcwd()
    os.chdir('/tmp')

ext_modules = cythonize(extensions, compiler_directives=compiler_directives)

if enable_coverage:
    os.chdir(old_wd)

setup(ext_modules=ext_modules)

This can be a workround before this issue is fixed.

silverzhaojr commented 2 years ago

I found the abs path of filename is never included for the module init / func definition part, even I apply the workaround above. It's removed on purpose, but I don't know why:

https://github.com/cython/cython/blob/master/Cython/Compiler/ModuleNode.py#L913

    def generate_filename_table(self, code):
        from os.path import isabs, basename
        code.putln("")
        code.putln("static const char *%s[] = {" % Naming.filetable_cname)
        if code.globalstate.filename_list:
            for source_desc in code.globalstate.filename_list:
                file_path = source_desc.get_filenametable_entry()
                if isabs(file_path):
                    file_path = basename(file_path)  # never include absolute paths    <== here
                escaped_filename = file_path.replace("\\", "\\\\").replace('"', r'\"')
                escaped_filename = as_encoded_filename(escaped_filename)
                code.putln('%s,' % escaped_filename.as_c_string_literal())
        else:
            # Some C compilers don't like an empty array
            code.putln("0")
        code.putln("};")

My sample .pyx file is like:

# file: add.pyx
def testfunc(a, b):
    return a + b

The generated C code is like:

static const char *__pyx_f[] = {
  "add.pyx",  // <== relative path
};
...
  __Pyx_TraceCall("testfunc", __pyx_f[0], 1, 0, __PYX_ERR(0, 1, __pyx_L1_error));
...
  __Pyx_TraceCall("__Pyx_PyMODINIT_FUNC PyInit_add(void)", __pyx_f[0], 1, 0, __PYX_ERR(0, 1, __pyx_L1_error));

In such case, the coverage.py cannot cover the import or def statements because it cannot get the abs path of the file, the func body is still covered though.

Could someone point out why we use the relative path for it?

da-woods commented 2 years ago

Could someone point out why we use the relative path for it?

I don't know if this was the original reason, but people distribute the compiled C files, and abs paths can contain personal information:

/home/davidwoods/projects/super_secret_deathray_project/modules/useful_cython_utility/calculations.pyx

That tells anyone that I distribute it to my full name (which I may not want public), and the fact that I'm building a secret death-ray, when all I really meant to do is release some small Cython utility that I'm using. Potentially directory structures of a build server might be a minor security issue.

Ignoring that, it's common for Cython modules to be distributed either as .c files or as compiled modules. __pyx_f is mainly used for tracebacks, and in both cases a traceback to a path on my PC is pretty useless to you.

It definitely doesn't seem like we'd want to distribute full paths by defaul.

silverzhaojr commented 2 years ago

abs paths can contain personal information

make sense.

But how about using abs path for it if line tracing is enabled? Such case is usually for profiling, and we know what we do and what we want, so containing the personal information can be acceptable.

da-woods commented 2 years ago

But how about using abs path for it if line tracing is enabled? Such case is usually for profiling, and we know what we do and what we want, so containing the personal information can be acceptable.

That does seem like it might be reasonable solution.

In such case, the coverage.py cannot cover the import or def statements because it cannot get the abs path of the file, the func body is still covered though.

It seems like this should work with multi-phase module initialization. By the time the main module init is run we should know the filename of the .so file

But for both of these comments I haven't really dug into the issue myself so I could be missing something

silverzhaojr commented 2 years ago

Here are simple steps to reproduce if you want to have a try:

1. Prepare the test files

# file structure
cython-test/
├── main.py
└── mod
    ├── mymath.pyx
    └── setup.py
# main.py
from mod.mymath import add
result = add(2, 3)
# mod/mymath.pyx
def add(a, b):
    return a + b
# mod/setup.py
from setuptools import setup
from distutils.extension import Extension

from Cython.Build import cythonize

import os

extensions = [
    Extension('mymath',
        [os.path.abspath('mymath.pyx')],
        define_macros=[('CYTHON_TRACE', '1')],
    ),
]

# workaround to include abs path in built .so file
old_wd = os.getcwd()
os.chdir('/tmp')

ext_modules = cythonize(
    extensions,
    compiler_directives={
        'linetrace': True,
        'language_level': '3',
    },
)

os.chdir(old_wd)

setup(
    ext_modules = ext_modules
)

2. Build the .so extension

$ pip install coverage Cython
$ cd cython-test/mod
$ python3 setup.py build_ext --inplace
$ ls
build
mymath.c
mymath.cpython-38-darwin.so
mymath.pyx
setup.py

3. Run coverage.py

$ cd .. # in directory cython-test
$ coverage run main.py
$ coverage report --ignore-errors --show-missing
/Library/Python/3.8/lib/python/site-packages/coverage/report.py:87: CoverageWarning: Couldn't parse '/Volumes/ramdisk/cython-test/mymath.pyx': No source for code: '/Volumes/ramdisk/cython-test/mymath.pyx'. (couldnt-parse)
  coverage._warn(msg, slug="couldnt-parse")
Name             Stmts   Miss  Cover   Missing
----------------------------------------------
main.py              2      0   100%
mod/mymath.pyx       2      1    50%   1
----------------------------------------------
TOTAL                4      1    75%