bloomberg / memray

Memray is a memory profiler for Python
https://bloomberg.github.io/memray/
Apache License 2.0
13.36k stars 397 forks source link

Try to respect RPATHS of calling dlopen modules with dlinfo #549

Closed pablogsal closed 2 months ago

pablogsal commented 9 months ago

This commit is just for backup on our main strategy of using dlinfo instead of dlopen.

Issue number of the reported bug or feature request: #

Describe your changes A clear and concise description of the changes you have made.

Testing performed Describe the testing you have performed to ensure that the bug has been addressed, or that the new feature works as planned.

Additional context Add any other context about your contribution here.

codecov-commenter commented 9 months ago

Codecov Report

Attention: Patch coverage is 92.30769% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.89%. Comparing base (5d9e04d) to head (7e8b4b1).

Files with missing lines Patch % Lines
src/memray/_memray/hooks.cpp 92.30% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #549 +/- ## ========================================== - Coverage 92.99% 92.89% -0.10% ========================================== Files 94 94 Lines 11445 11470 +25 Branches 2114 2114 ========================================== + Hits 10643 10655 +12 - Misses 802 815 +13 ``` | [Flag](https://app.codecov.io/gh/bloomberg/memray/pull/549/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=bloomberg) | Coverage Δ | | |---|---|---| | [cpp](https://app.codecov.io/gh/bloomberg/memray/pull/549/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=bloomberg) | `92.89% <92.30%> (-0.10%)` | :arrow_down: | | [python_and_cython](https://app.codecov.io/gh/bloomberg/memray/pull/549/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=bloomberg) | `92.89% <92.30%> (-0.10%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=bloomberg#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

godlygeek commented 4 months ago

We've decided to keep this option in our back pocket, but for now we're going to move forward with interposing dlsym instead of dlopen.

godlygeek commented 2 months ago

One downside I just noticed for this approach is that we're screwing up the contract for dlerror(): we're making a bunch of dlopen and dlsym and dlclose calls that the application doesn't know about, which could result in the return value from dlerror being clobbered by one of our internal attempts, rather than the last attempt the user made.

I'm not sure that's worth trying to work around, but it's worth noting, at least...

pablogsal commented 2 months ago

One downside I just noticed for this approach is that we're screwing up the contract for dlerror(): we're making a bunch of dlopen and dlsym and dlclose calls that the application doesn't know about, which could result in the return value from dlerror being clobbered by one of our internal attempts, rather than the last attempt the user made.

We can clear the possible error before calling the real dlopen if you want by calling dlerror() ourselves as this is documented to clean existing errors when called: https://github.com/bminor/glibc/blob/96d0bf98cafd0b63721f369ca21ec64590551d47/dlfcn/dlerror.c#L120

pablogsal commented 2 months ago

@godlygeek Btw, I made some experiments regarding the null values of paths->dls_serpath[i].dls_name. Looks like empty entries are already resolved to ".":

LD_LIBRARY_PATH=":a:b" python -m pytest tests/integration/test_extensions.py -v -s -k test_dlopen_with_rpath
...
...
Num paths 8
Checking .
Checking a
Checking b
Checking /tmp/pytest-of-root/pytest-23/test_dlopen_with_rpath0/sharedlibs/sharedlibs
Checking /lib/aarch64-linux-gnu
Checking /usr/lib/aarch64-linux-gnu
Checking /lib
Checking /usr/lib

same if you do -Wl,-rpath,:a:$ORIGIN/sharedlibs:

Num paths 7
Checking .
Checking a
Checking /tmp/pytest-of-root/pytest-24/test_dlopen_with_rpath0/sharedlibs/sharedlibs
Checking /lib/aarch64-linux-gnu
Checking /usr/lib/aarch64-linux-gnu
Checking /lib
Checking /usr/lib

maybe we should bail out like I did before on nullptr.

godlygeek commented 2 months ago

maybe we should bail out like I did before on nullptr.

I don't think it's possible to get paths->dls_serpath[i].dls_name == nullptr, so I don't think it matters too much what we do, beyond that we shouldn't crash if it somehow happens. I'd suggest we just continue and ignore that entry, but treating it the same as "" seems fine too.

Checking the glibc code, it looks like it never assigns a null pointer to dls_name, and the docs don't suggest that it ever could. In fact (in retrospect unsurprisingly, considering there's no call to free this structure and dlclose could be called asynchronously from another thread) every dls_name is assigned to some pointer into the paths buffer. That is, RTLD_DI_SERINFOSIZE sets dls_size to a size that's big enough not only for an array of dls_cnt entries of type dls_serpath, but also for dls_cnt paths following the dls_serpath array, all concatenated together with '\0' bytes between them, and paths->dls_serpath[i].dls_name is set to the i'th string in that buffer.

pablogsal commented 2 months ago

I don't think it's possible to get paths->dls_serpath[i].dls_name == nullptr, so I don't think it matters too much what we do, beyond that we shouldn't crash if it somehow happens. I'd suggest we just continue and ignore that entry, but treating it the same as "" seems fine too.

I pushed a fixup for that