Open inconstante opened 3 years ago
FWIW, I agree.
I agree that having a specific binary format metadata file adds unnecessary complexity.
If I may, I'd like to suggest an alternate approach from the perspective of a new user. Is it necessary to have ulp_packer
as a separate tool?
That I am aware of, ulp_packer
simply prepares and validates the patch for later usage in ulp_trigger
and ulp_check
, saving a metadata file in the process. The metadata file, that I can tell, contains user-specified description file information in addition to data read from the to-be-patched library's ELF file and a MD4 digest of the patch.
Why not instead embed the preparation/validation logic of ulp_packer
in common functions used by both ulp_trigger
and ulp_check
, and have both take as input the user-defined description file? This would eliminate the need for ulp_packer
, ulp_dump
, and the metadata file at the cost of added complexity to ulp_trigger
and ulp_check
.
A couple drawbacks I see with this approach:
Redundant computation performed when a user wants to verify the patch with ulp_check
. I'm unsure what the overhead is for reading the ELF files of large programs, but we could eliminate this by making ulp_trigger
output a textual metadata format of the applied patch (as discussed) for usage in ulp_check
. This approach would complexity to ulp_trigger
and potential command line argument bloat, though.
Breakage of any existing tooling built around the existing ULP interface.
That I am aware of,
ulp_packer
simply prepares and validates the patch for later usage inulp_trigger
andulp_check
, saving a metadata file in the process. The metadata file, that I can tell, contains user-specified description file information in addition to data read from the to-be-patched library's ELF file and a MD4 digest of the patch.
Yes, the metadata file is based on the description file, which is not only human readable, but written by-hand. That file is fed to ulp_packer
, which creates the final metadata file.
As a matter of fact, in the test suite, the metadata generation process is even longer:
__ABS_BUILDDIR__
, and records the output into an actual description file.ulp_packer
converts the description file to binary format, as well as adds the missing information (build-id and digest).If we switch from binary to text files, we still have to record the build-id and the digest into the metadata. There are at least two ways to do it:
Why not instead embed the preparation/validation logic of
ulp_packer
in common functions used by bothulp_trigger
andulp_check
and have both take as input the user-defined description file?
One of the goals of having the build-id and digest in the metadata file is to make sure that the metadata file and the live patch DSO match. Suppose you have a metadata file, but somehow you got the wrong live patch DSO. In this case, ulp_trigger
will notice the mismatch, and refrain from applying the live patch at all. The build-id and digest create a correspondence between the two files that compose a live patch.
In other words, ulp_packer
is used during live patch creation time to record this correspondence into the metadata file. Then, during live patch application time, ulp_trigger
uses that information to verify the match.
I think ulp_trigger
is the wrong place to put that. Likewise for ulp_check
.
That said, I still think we shouldn't get rid of ulp_packer
, because it automatically writes the build-id and digest into the metadata file.
* Breakage of any existing tooling built around the existing ULP interface.
Good point, but, for now, we don't care too much about it, specially because libpulp has never been released. :)
@a-gavin
Pull request #19 (just merged) affects this, because it adds more lines to the metadata file (lines starting with #), as well as some code to ulp_packer
and lib/ulp.c
to deal with them.
Yes, the metadata file is based on the description file, which is not only human readable, but written by-hand. That file is fed to
ulp_packer
, which creates the final metadata file.As a matter of fact, in the test suite, the metadata generation process is even longer:
1. First of all, a human writes a _template_ description file, such as [libaddress_livepatch1.in](https://github.com/SUSE/libpulp/blob/master/tests/libaddress_livepatch1.in); 2. Then, the build system calls a bunch of scripts to replace macros, such as `__ABS_BUILDDIR__`, and records the output into an actual _description file_. 3. Finally, `ulp_packer` converts the description file to binary format, as well as adds the missing information (build-id and digest).
Good to know.
If we switch from binary to text files, we still have to record the build-id and the digest into the metadata. There are at least two ways to do it:
1. Write the build-id and digest into the description file **by-hand**. 2. Let ulp_packer do it for us.
One of the goals of having the build-id and digest in the metadata file is to make sure that the metadata file and the live patch DSO match. Suppose you have a metadata file, but somehow you got the wrong live patch DSO. In this case,
ulp_trigger
will notice the mismatch, and refrain from applying the live patch at all. The build-id and digest create a correspondence between the two files that compose a live patch.In other words,
ulp_packer
is used during live patch creation time to record this correspondence into the metadata file. Then, during live patch application time,ulp_trigger
uses that information to verify the match.
This makes sense. Easy to overlook the complexity involved in the process of generating and applying a live patch!
I think
ulp_trigger
is the wrong place to put that. Likewise forulp_check
.
Can you elaborate? From what I understand, it makes sense for ulp_trigger
to verify the match if it is going to apply the live patch.
That said, I still think we shouldn't get rid of
ulp_packer
, because it automatically writes the build-id and digest into the metadata file.
Sure. Easier on the user to have a tool do it for them.
Once I submit PR's for #25 and #26, I'd be happy to work on this!
I think
ulp_trigger
is the wrong place to put that. Likewise forulp_check
.Can you elaborate? From what I understand, it makes sense for
ulp_trigger
to verify the match if it is going to apply the live patch.
What I mean is that ulp_packer
is the right place to give the final touches to the metadata file (i.e.: to start from a partially filled metadata file template, e.g. libfoo_livepatch1.in, find the build-id of the target library, and record it into an actual metadata file, e.g. libfoo_livepatch1.ulp); whereas ulp_trigger
(as well as ulp_check
) is the right place to check that the metadata file and the live patch object, e.g. libfoo_livepatch1.so, match.
So, ulp_packer
creates the metadata file and ulp_trigger
consumes it.
Once I submit PR's for #25 and #26, I'd be happy to work on this!
:)))))
Metadata is now integrated into the livepatch container (.so).
I think now getting rid of the binary metadata format is pointless: we now need a dump tool more than ever, and the binary format keep size of things determined, so we do not need to carry the size of the metadata when it is loaded in memory.
What you guys think?
I agree to that, though even binary meta-data can be complicated and not so complicated :-) Our metadata now (in the .so) has a determined size (from the section it's contained in), so, e.g. all strings therein could be null-terminated as usual in C, not size-prefixed. Overreads can be avoided as the overall size is known.
And, of course, the meta-data in the .so could also just as well be text based as well, so that the "dump" tool would be trivial: readelf -p .comment.ulp (or similar) :-)
The thing is that metadata extraction is not done in libpulp.so: it is now done in ulp and then ptraced to libpulp.so into a statically allocated buffer. We do that because we do not want to carry libelf with libpulp. We only check if the metadata will fit into the buffer on ulp side (buffer is 512k, which is plenty atm), and since the metadata itself has the size of it, we do not have to worry about passing its size around.
Currently, the metadata file is in binary format.
Strings in the metadata file are not null-terminated and there are no lines, thus no end-of-line characters (or sequences of characters). As far as I can tell, the only benefit that it adds relies on the fact that end-of-line characters vary across platforms (or even across text editors), which could lead to broken parsing if Libpulp did not handle varying styles of newlines properly.
The benefit comes at a cost in complexity:
dump
tool, which adds maintenance burden;Apart from that, the
packer
tool, which is responsible for recording build-id and checksum information into the metadata file, also converts from text to binary format. If a text format was used, the packer tool would still be useful to handle build-ids and checksums, however it would allow users to handle them manually, with a regular text editor (on quick tests, for example).In my opinion, the drawbacks of the binary format outweigh the benefits.