davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.39k stars 3.36k forks source link

[Bug]: Multiple errors using dlib on macOS (either dynamic linked or statically compiled) with test target #2993

Open objectivecosta opened 4 weeks ago

objectivecosta commented 4 weeks ago

What Operating System(s) are you seeing this problem on?

macOS (Apple Silicon)

dlib version

19.24

Python version

N/A

Compiler

clang 15.0.0

Expected Behavior

Following dlib's examples/CMakeLists.txt example, one should be able to embed dlib as a static library in a project in macOS/aarch64 successfully.

Current Behavior

I encounter ld: library 'libpng' not found when following the instructions on examples/CMakeLists.txt inside a static library that is linked against by a test target.

Forcing dlib to build libpng (and others) instead of using system installs results in other errors such as:

Undefined symbols for architecture arm64:
  "_png_do_expand_palette_rgb8_neon", referenced from:
      _png_do_expand_palette in libdlib.a[66](pngrtran.c.o)
  "_png_do_expand_palette_rgba8_neon", referenced from:
      _png_do_expand_palette in libdlib.a[66](pngrtran.c.o)
  "_png_riffle_palette_neon", referenced from:
      _png_do_read_transformations in libdlib.a[66](pngrtran.c.o)
ld: symbol(s) not found for architecture arm64

Steps to Reproduce

Here were my debugging steps:

I assumed that this would work. However, with this setup that would have dlib, libpng, libjpeg and libwebp all compiled inside the project, I now hit:

Undefined symbols for architecture arm64:
  "_png_do_expand_palette_rgb8_neon", referenced from:
      _png_do_expand_palette in libdlib.a[66](pngrtran.c.o)
  "_png_do_expand_palette_rgba8_neon", referenced from:
      _png_do_expand_palette in libdlib.a[66](pngrtran.c.o)
  "_png_riffle_palette_neon", referenced from:
      _png_do_read_transformations in libdlib.a[66](pngrtran.c.o)
ld: symbol(s) not found for architecture arm64

When executing my test target...

Looking at my build folder, I see that dlib did build multiple object files, including the ones referenced in the error pngrtran.c.o inside the folder cmake-build-debug/third_party/dlib-19.24/dlib/CMakeFiles/dlib.dir/external/libpng. So I'd assume that those symbols would be defined and findable. Since it mentions _neon, I am assuming that it is indeed finding arm64 stuff correctly – which made me even more confused.

Is there anything obvious that I am missing? Or is dlib not compatible with macOS aarch64 setups?

Anything else?

Basically this was an attempt of re-building an old project I had in macOS x86_64, so I am assuming this is something arch-related.

I'd be glad to help out and update documentations if this is indeed something that can be improved for other new-comers to the project.

davisking commented 4 weeks ago

It should all work, but maybe the neon code in the copy of libpng we have in external/ just doesn't work on your machine. The libpng in external is only there as a fallback for people who don't have libpng installed or can't figure out how to install it for whatever reason. Or have broken copies like yours apparently. There are a lot of package managers that install broken copies of libpng or other libraries out there. Which is something I have no control over :shrug:

Anyway, maybe the neon code just doesn't work and should be deleted. Try removing the libpng files with arm or neon in the name. Although we had this a while ago which is how one of them got there https://github.com/davisking/dlib/pull/2664

Frankly I would prefer to not have any non-portable code in external/ at all. Since it's only a fallback and shouldn't be used. People should use a more official libpng if they really want it to be super fast. That way it isn't dlib's responsibility for having a build system that can build all these other libraries with their platform specific hardware acceleration :)

That's all to say, yeah, see what you can do to make it work on your machine. I would be fine with a PR that just disabled all the neon stuff if that's what makes it work, since that would be fundamentally more portable (i.e. easily built across all platforms).

objectivecosta commented 4 weeks ago

Thanks for the quick reply! Interestingly enough, if I remove the "lib" prefix from the png/jpeg libraries in find_libpng and find_libjpeg, it seems to work just fine (it successfully links by using -lpng instead of -llibpng – achieved that by manually using set(PNG_LIBRARIES "png;z")).

I would assume that this would also cause test_for_libpng to also fail linking, but for some reason, it doesn't (which is still baffling for me)

As for the embedded libpng copy, I'll look into removing the neon stuff and seeing if it compiles locally. If so, I'll submit a PR!