geofft / redhook

Dynamic function call interposition / hooking (LD_PRELOAD) for Rust
BSD 2-Clause "Simplified" License
176 stars 18 forks source link

dlsym(RTLD_NEXT) doesn't always work #2

Open jethrogb opened 9 years ago

jethrogb commented 9 years ago

Sometimes you need to do some magic to find the entry point for the real function. Here's an example that uses dl_iterate_phdr when dlsym(RTLD_NEXT,...) fails: https://github.com/jethrogb/ssltrace/blob/bf17c150a7/ssltrace.cpp#L74-L112

geofft commented 9 years ago

Interesting. Looking at your commit history, this is specifically for when the library you're hooking was opened with RTLD_LOCAL? I am sort of surprised that the LD_PRELOAD even gets used in that case.

If I hack up your test case a bit:

#define _GNU_SOURCE

#include <dlfcn.h>
#include <stdio.h>

__attribute__((__weak__))
void gnutls_handshake(void);

int main(void)
{
        void *handle = dlopen("libgnutls.so",RTLD_LOCAL|RTLD_NOW);
        printf("directly: gnutls_handshake = %p\n", gnutls_handshake);
        void *fp=dlsym(handle,"gnutls_handshake");
        printf("through handle: gnutls_handshake = %p\n", fp);
        fp=dlsym(RTLD_DEFAULT,"gnutls_handshake");
        printf("through RTLD_DEFAULT: gnutls_handshake = %p\n", fp);
        return 0;                                                    
}

I get

directly: gnutls_handshake = (nil)
through handle: gnutls_handshake = 0x7fc32647be80
through RTLD_DEFAULT: gnutls_handshake = (nil)

but with this preload:

#include <stdio.h>

void gnutls_handshake(void) {}

__attribute__((__constructor__))
void ctor(void) {
        fprintf(stderr, "address of preloaded gnutls_handshake is %p\n", gnutls_handshake);
}

I get

address of preloaded gnutls_handshake is 0x7fbb22dbe730
directly: gnutls_handshake = (nil)
through handle: gnutls_handshake = 0x7fbb22524e80
through RTLD_DEFAULT: gnutls_handshake = 0x7fbb22dbe730

In other words, without the LD_PRELOAD, only dlsym through the handle actually works, and with the LD_PRELOAD, it isn't getting called when you do that, so we don't even get to the point of caring about RTLD_NEXT.

What was your failing case? (Is there an actual GnuTLS-using program I can poke at?)

jethrogb commented 9 years ago

Oh I had completely forgotten about that test case. I'm sorry, I don't remember the actual program that led me to write this code. I think it might have to do with program A loading a library B using RTLD_LOCAL, where that library B depends on library C (here C being GnuTLS).

geofft commented 9 years ago

OK, I can reproduce this (glibc 2.19, Debian 8.1 x86_64): if I have a library intermediate.so that depends on libgnutls.so, and a preload library preload.so that overrides gnutls_handshake, then if intermediate.so is loaded through RTLD_LOCAL, internal calls to libgnutls.so within intermediate.so will hit the preload, but the preload itself will get a null return from RTLD_NEXT. If intermediate.so is instead loaded through RTLD_GLOBAL, things work.

Oddly enough, directly asking for the address of gnutls_handshake via the handle to intermediate.so (regardless of RTLD_LOCAL vs. RTLD_GLOBAL) does not hit the preload.

$ LD_PRELOAD=./preload.so ./main
0x7ffa172cb750 from preload constructor
0x7ffa172cb750 from intermediate constructor
0x7ffa16830e80 from main through dlsym intermediate.so
0x7ffa172cb750 from intermediate function
(nil) from RTLD_NEXT inside preload
0x7ffa16830e80 from main through dlsym libgnutls.so
$ LD_PRELOAD=./preload.so ./main g
0x7f3edbf1b750 from preload constructor
0x7f3edbf1b750 from intermediate constructor
0x7f3edb480e80 from main through dlsym intermediate.so
0x7f3edbf1b750 from intermediate function
0x7f3edb480e80 from RTLD_NEXT inside preload
0x7f3edb480e80 from main through dlsym libgnutls.so

Source code in this gist.

This smells like a glibc bug. On my FreeBSD 10.2 VM (swapping libgnutls.so and gnutls_handshake for libreadline.so and write_history, since it doesn't have GnuTLS by default):

# env LD_PRELOAD=./preload.so ./main
0x80081f570 from preload constructor
0x80081f570 from intermediate constructor
0x801815010 from main through dlsym intermediate.so
0x80081f570 from intermediate function
0x801815010 from RTLD_NEXT inside preload
0x801815010 from main through dlsym libreadline.so
# env LD_PRELOAD=./preload.so ./main g
0x80081f570 from preload constructor
0x80081f570 from intermediate constructor
0x801815010 from main through dlsym intermediate.so
0x80081f570 from intermediate function
0x801815010 from RTLD_NEXT inside preload
0x801815010 from main through dlsym libreadline.so

I think we should start by reporting this to the glibc maintainers and asking what the intended behavior is. I don't think there's a way to reliably determine in dl_iterate_phdr which of the various loaded libraries is really the next library to call: part of the usefulness of RTLD_LOCAL is to allow you to load multiple libraries that expose symbols with the same name, and get the right one. So I'd rather not go that route unless we have to.

jethrogb commented 9 years ago

Thanks for investigating this further!

I agree that there is some ambiguity in some cases. The ambiguity arises when multiple libraries that define the same symbol get loaded AND some of those symbols are being linked with the preloaded symbol instead. I think this can happen in the following cases:

I'll try to come up with test cases for both of these.

Ideally a wrapper function would be able to identify the exact function that is being replaced, but I'm not sure if that's possible.

jethrogb commented 8 years ago

Sorry, I got distracted by a bug in ld. This gist builds test cases for each of the above scenarios, as well as 7 test binaries. 6 binaries linked to two versions of each scenario and 1 binary that only uses libdl. All binaries are meant to be run with LD_LIBRARY_PATH=..

An interesting test case is for example test-lsoname1 dl,now,libusesymver2.so, in which the dynamic completely ignores the symbol version request of libusesymver2.so. Fun fact: the library with the correct versioned symbol is mapped into memory.