flavorjones / mini_portile

mini_portile and mini_portile2 - Simple autoconf and cmake builder for developers
MIT License
114 stars 51 forks source link

mkmf_config support for pkg-config files with multiple library directories #138

Open mudge opened 5 months ago

mudge commented 5 months ago

As mentioned in https://github.com/mudge/re2/pull/138#discussion_r1521442743, the re2.pc pkg-config file generated by RE2 contains libraries from two directories: RE2 itself and its sole dependency, Abseil (note I’ve populated PKG_CONFIG_PATH prior to this so that it prefers the Abseil in ports over my system install):

$ pkg-config --libs --static re2.pc
-L/Users/mudge/Projects/re2/ports/arm64-apple-darwin22/libre2/2024-03-01/lib -L/Users/mudge/Projects/re2/ports/arm64-apple-darwin22/abseil/20240116.1/lib -pthread -lre2 -labsl_flags_internal -labsl_flags_marshalling -labsl_flags_reflection -labsl_flags_private_handle_accessor -labsl_flags_commandlineflag -labsl_flags_commandlineflag_internal -labsl_flags_config -labsl_flags_program_name -labsl_cord -labsl_cordz_info -labsl_cord_internal -labsl_cordz_functions -labsl_cordz_handle -labsl_crc_cord_state -labsl_crc32c -labsl_crc_internal -labsl_crc_cpu_detect -labsl_raw_hash_set -labsl_hash -labsl_city -labsl_bad_variant_access -labsl_low_level_hash -labsl_hashtablez_sampler -labsl_exponential_biased -labsl_bad_optional_access -labsl_str_format_internal -labsl_synchronization -labsl_graphcycles_internal -labsl_kernel_timeout_internal -labsl_stacktrace -labsl_symbolize -labsl_debugging_internal -labsl_demangle_internal -labsl_malloc_internal -labsl_time -labsl_civil_time -labsl_strings -labsl_strings_internal -labsl_string_view -labsl_base -labsl_spinlock_wait -labsl_int128 -labsl_throw_delegate -labsl_raw_logging_internal -labsl_log_severity -labsl_time_zone

At the moment, this doesn’t feel compatible with MiniPortile2’s new mkmf_config(static:) functionality as it assumes the .pc file only specifies libraries within its own directory (see https://github.com/flavorjones/mini_portile/blob/52fb0bc41c89a10f1ac7b5abcf0157e059194374/lib/mini_portile2/mini_portile.rb#L319) and that there is a single, entrypoint pc file for a library (Abseil ships with over 180 individual ones for each library).

What I actually want here is to turn all of the -l flags into static, absolute paths by looking for the matching file in each of the library directories specified by the two -L flags, e.g.

-lre2 -labsl_flags_internal

Becomes:

/Users/mudge/Projects/re2/ports/arm64-apple-darwin22/libre2/2024-03-01/lib/libre2.a /Users/mudge/Projects/re2/ports/arm64-apple-darwin22/abseil/20240116.1/lib/libabsl_flags_internal.a

Would it be possible to extend the API to support this use case (potentially also including setting PKG_CONFIG_PATH)?

mudge commented 5 months ago

Looking at https://github.com/flavorjones/mini_portile/blob/52fb0bc41c89a10f1ac7b5abcf0157e059194374/lib/mini_portile2/mini_portile.rb#L327-L334, perhaps my particular issue is because Abseil comes with over 180 individual .pc files so there isn't a single file to reference and pre-populate $MINI_PORTILE_STATIC_LIBS with before loading RE2's re2.pc (viz. if I were able to use mkmf_config(static:) with Abseil first, perhaps using it with RE2 would be able to generate the correct static flags already).

That said, perhaps changing mkmf_config(static:) to look up libraries in every directory in $LIBPATH when trying to generate static links would cover what I need here?

flavorjones commented 5 months ago

Thank you for opening this! Hoping to looking into it in the next week.

mudge commented 5 months ago

FWIW we explicitly restrict our search for static replacements of libraries to the two lib directories of our recipes in re2 but I wonder if it would be as safe (and much more general) to use all of $LIBPATH especially if we’re prepending every new -L flag from the .pc file to it.

mudge commented 5 months ago

In case it is of any use, I had some time today to try and spike a mix of how your mkmf_config works and how we're currently handling it in re2 and came up with the following:

ENV["PKG_CONFIG_ALLOW_SYSTEM_CFLAGS"] = "t"
pkg_config_paths = [
  "#{abseil_recipe.path}/lib/pkgconfig",
  "#{re2_recipe.path}/lib/pkgconfig"
]

# Set up pkg-config to prefer our vendored RE2 and Abseil
pkg_config_paths.prepend(ENV['PKG_CONFIG_PATH']) if ENV['PKG_CONFIG_PATH']
ENV['PKG_CONFIG_PATH'] = pkg_config_paths.join(File::PATH_SEPARATOR)

pc_file = File.join(re2_recipe.path, 'lib', 'pkgconfig', 're2.pc')

# Collect static -L directories to resolve static libraries
library_dirs = xpopen(['pkg-config', '--libs-only-L', '--static', pc_file], err: %i[child out], &:read).strip
static_library_dirs = library_dirs.shellsplit.map { |flag| flag.sub(/\A-L/, "") }
puts "Static library dirs are #{static_library_dirs.inspect}"

# Convert -l flags to static links within the collected static directories where possible
libflags = xpopen(['pkg-config', '--libs-only-l', '--static', pc_file], err: %i[child out], &:read)
  .shellsplit
  .map do |flag|
    next flag unless flag.start_with?("-l")

    static_lib = "lib#{flag[2..]}.#{$LIBEXT}"
    static_lib_dir = static_library_dirs.find { |dir| File.exist?(File.join(dir, static_lib)) }
    next flag unless static_lib_dir

    File.join(static_lib_dir, static_lib)
  end

# Prepend all flags (including the static replacements) to $libs
$libs = [libflags, $libs].join(" ").strip

# Prepend the original -L flags to LDFLAGS otherwise the final linking step fails with "library 're2' not found"
$LDFLAGS = [library_dirs, $LDFLAGS].join(" ").strip

# Prepend INCFLAGS
incflags = xpopen(['pkg-config', '--cflags-only-I', pc_file], err: %i[child out], &:read).strip
$INCFLAGS = [incflags, $INCFLAGS].join(" ").strip

# Append CFLAGS and CXXFLAGS
cflags = xpopen(['pkg-config', '--cflags-only-other', pc_file], err: %i[child out], &:read).strip
$CFLAGS = [$CFLAGS, cflags].join(" ").strip
$CXXFLAGS = [$CXXFLAGS, cflags].join(" ").strip

Inspecting the resulting re2.bundle with otool on macOS gives the following:

$ otool -L lib/re2.bundle
lib/re2.bundle:
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1700.255.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.100.2)

I'm not sure what the significance of having to set LDFLAGS as well as libs is however, perhaps you know better?

Update: it might be worth adding that we’re currently relying on dir_config to specify the location of RE2 (which seems to populate $LIBPATH with the lib directory and $CPPFLAGS with -I flags) but I’m hoping that’ll no longer be necessary if we can set things up right.

Here's another version of the last few lines that only sets $libs, $LIBPATH and $INCFLAGS and compiles successfully:

# Set $libs with all the flags including our static replacements
$libs = [libflags, $libs].join(" ").strip

# Tell Ruby where to find the libraries
$LIBPATH = static_library_dirs | $LIBPATH

# Add INCFLAGS
incflags = xpopen(['pkg-config', '--cflags-only-I', pc_file], err: %i[child out], &:read).strip
$INCFLAGS = [incflags, $INCFLAGS].join(" ").strip
mudge commented 5 months ago

I’ve now distilled this into https://github.com/mudge/re2/pull/138/commits/ac14e8f9af0632f18fe12c568795cc9a9bcd6a7f

mudge commented 5 months ago

It’s worth noting that preferring $libs over $LDFLAGS seems to cause a problem if you have the library in Ruby’s RbConfig::CONFIG['exec_prefix'] (a bug @stanhu found when working on re2) so I’ve since switched to setting $LDFLAGS directly which seems to resolve the problem. It’d be good to dig into why this happens though.

flavorjones commented 5 months ago

@mudge thanks for all your work here. i've been heads-down during my OSS time trying to work on a few urgent things and haven't been able to get back to this yet. this topic is at the top of my backlog, though, and I hope to get to it in the next week or two.

mudge commented 5 months ago

You’re welcome, there’s no rush from my side. I had some time this weekend to spend on it hence the over-communication both here and on my (now draft) PR in re2.

The main thing I’d like to understand more is the significance between $LDFLAGS and $libs and $LIBPATH for the library search paths and individual libraries as well as $INCFLAGS and $CPPFLAGS for include search paths (eyeballing the difference in them between my main branch and the new PR hasn’t been especially enlightening) and why it fixes the issue of libraries in exec_prefix being linked ahead of the static ones.

It has been tricky because so many MakeMakefile methods indirectly mutate one or more of the globals and that isn’t always easy to spot (e.g. have_library calling dir_config).

mudge commented 5 months ago

Quick update: I think I got to the bottom of why I saw a regression in re2 and it was to do with using have_library("re2") as it adds -lre2 to LIBS in the resulting Makefile (even though ldflags already has all the static libraries) which makes it possible for the gem to pick up any libre2.as elsewhere in the search path.

mudge commented 5 months ago

Having churned on this a bit, I ended up releasing re2 2.10.0 with the refactored pkg-config logic.

In summary:

This way, I hope that we include the bare minimum necessary and use the “highest” possible level of abstraction (e.g. the more intention-revealing $INCFLAGS vs the more generic $CPPFLAGS) for both the compilation ($INCFLAGS, $CFLAGS, $CXXFLAGS) and linking ($LDFLAGS, $libs) without adding unnecessary directories to the $LIBPATH and minimising the risk of linking a library dynamically rather than statically.

I’m not sure our strategy of searching for static libraries in every -L flag returned by pkg-config is generalisable for your API but hopefully some of this is useful if only to confirm your existing design decisions.

mudge commented 5 months ago

If nothing else, it’d be quite nice to delete my copy of minimal_pkg_config and rely on yours if it was made part of MiniPortile2’s public API.