davidlattimore / wild

Apache License 2.0
664 stars 16 forks source link

Support any alignment for string merge sections #111

Closed marxin closed 2 months ago

marxin commented 2 months ago

This is probably a naive implementation, but still something we can discuss. I noticed the clang binary contains a string section Wild can't deal with right now: 0.7% 1.51Mi 0.9% 1.51Mi .rodata.str1.8. As the name suggests, the section's alignment is equal to 8. My patch causes crashes for various binaries.

Why is the non-one alignment used for strings as the strings have variadic lengths and so you don't have any guarantees about an alignment of a reference string in such a section? Or am I wrong?

marxin commented 2 months ago

Oh, the strings can have variable length, but a subsequent string always starts at the declared alignment offset:

CMakeFiles/clang.dir/driver.cpp.o:     file format elf64-x86-64

Contents of section .rodata._ZL14ExecuteCC1ToolRN4llvm15SmallVectorImplIPKcEERKNS_11ToolContextE.str1.8:
 0000 6572726f 723a2075 6e6b6e6f 776e2069  error: unknown i
 0010 6e746567 72617465 6420746f 6f6c2027  ntegrated tool '
 0020 00000000 00000000 56616c69 6420746f  ........Valid to
 0030 6f6c7320 696e636c 75646520 272d6363  ols include '-cc
 0040 31272c20 272d6363 31617327 20616e64  1', '-cc1as' and
 0050 20272d63 63316765 6e2d7265 70726f64   '-cc1gen-reprod
 0060 75636572 272e0a00                    ucer'...        

which means we would have to introduce MergeStringsSection for each alignment for each output section. That sounds similar to part_id_with_alignment and UnresolvedMergeStringsFileSection::new should take into account the alignment and append multiple '\0' characters (if necessary). Then the merging should work as now and it might work :)

davidlattimore commented 2 months ago

That sounds similar to part_id_with_alignment and UnresolvedMergeStringsFileSection::new should take into account the alignment and append multiple '\0' characters (if necessary). Then the merging should work as now and it might work :)

Sounds like you've got things mostly figured out. I suspect you might also need to change MergeStringsSection::add_string. It's what allocates addresses to each string that gets added to the section. So it would need to take alignment into account.

marxin commented 2 months ago

Yep, that's yet another place that needs to be changed.

To be honest, we're speaking about a small saving we can gain, so I tend to close this PR for now.

marxin commented 2 months ago

The most important section from the string merging perspective is definitely .debug_str which has a proper alignment we deal with (10.