Live patches rely on in-disk images of libraries

inconstante commented 3 years ago

Currently, libpulp.so needs to open the in-disk image of target libraries and of libpulp.so itself to learn about:

the address of dlsym, which gets interposed;
the addresses of to-be-patched functions;
the addresses of live-patch support functions, e.g. __ulp_trigger;
~the load bias~.

However, requiring the disk-image has a problem... Since the in-disk image of the library and the in-memory image loaded by the target process must be identical, it makes it impossible for target libraries, as well as libpulp.so, to be updated in-disk, which would only work if package updates were forbidden and that is unreasonable.

Recently, #19 added a mechanism where addresses (more specifically offsets) get recorded in the metadata file, so that relying on in-disk images is not necessary. On the other hand, #19 only deals with addresses of LOCAL variables.

This Issue is here to remind us that we need to stop relying on in-disk images and that extending the metadata file (like in #19) is probably a good option.

giulianobelinassi commented 3 years ago

Well, if you look carefully:

This only happens when the process starts. This means that the library is present there, else the pachaable program wouldn't have launched at all with a missing library error.
Actually, we find that using dlsym.
It actually may be the only existing problem here.

susematz commented 3 years ago

For (1) maybe the interposition code could also be cleaned up a bit: I don't really see why it would really need access to the file on disk, it could just use dlsym(RTLD_NEXT, "real_function") to get at the addresses of the interposed but underlying functions.

For (2): right now we can use dlsym to lookup to be patched functions, but maybe in the future we might want to also live patch unexported functions. At that point we still can't rely on on-disk files, but we need something else similar to #19, just for all local symbols, not just variables.

For (3): yeah, that's still a problem. If libpulp.so is updated on-disk parse_lib_dynobj will get wrong info. I think we can solve this by either exporting the required symbols from libpulp.so and then parse the ELF structures from attached memory, or store the necessary addresses within a static data area of libpulp.so (at program start) that we place at a known offset within libpulp.so.

inconstante commented 3 years ago

For (1) maybe the interposition code could also be cleaned up a bit: I don't really see why it would really need access to the file on disk, it could just use dlsym(RTLD_NEXT, "real_function") to get at the addresses of the interposed but underlying functions.

I am probably missing something, but I don't know how to interpose dlsym itself. For the remaining functions in interpose.c, we use dlsym, indeed (after we found where dlsym is, using the file on disk).

EDIT: that was my rationale when I wrote it: https://github.com/SUSE/libpulp/blob/6ceeaeeae4443fbe66512fad067383bcb98debad/lib/interpose.c#L230-L247

susematz commented 3 years ago

Gah, right. We still don't have to use the on-disk file, we could parse the ELF structures in memory. But that is mood, as we know that this is done during process startup, so accessing memory or files is the same. I guess moving it to in-memory interpretation would eventually allow to remove all code opening and accessing files on disk, and therefore give us piece of mind that we absolutely don't rely on ELF files on disk, even if here it would be okay. But if anything then this would be low prio work.

giulianobelinassi commented 3 years ago

Isn't this fixed in commit 76beed4595eabc1d80d8fe68bc44e7cf25501dbb ?

susematz commented 3 years ago

Not yet. tools/introspection.c also contains code parsing ELF files from disk in order to know the addresses of the target process' libpulp function. (See get_loaded_symbol_addr). That process already uses in-memory reading for most of what it needs (e.g. linkmap base from the target) just for the symbol lookup it opens the file and uses libelf.

So, that would need changing as well. (And now it's clear why I somewhere mumbled about symbol reading from a remote process, but then managed to think that that was overthinking :-) ).

giulianobelinassi commented 3 years ago

Fixed that issue in tool/introspection.c on commit 5982650d2608f3600efd303c8071a4d7ee3065ff.

On the other hand, I think using in-disk images is still useful because we can have access to sections that are not easily reachable on the in-memory image (e.g. .symtab section). But we should implement a check to compare the remote process's build id with the on-memory image's build id, and if they match, then use the in-disk image. This could be useful to livepatch non-externally visible global symbols within the process.

giulianobelinassi commented 3 years ago

Done in 5982650d2608f3600efd303c8071a4d7ee3065ff

SUSE / libpulp

Live patches rely on in-disk images of libraries #20