genodelabs / goa

Tool for streamlining the development of Genode applications
GNU Affero General Public License v3.0
19 stars 18 forks source link

cmake: build libraries failed #68

Closed trimpim closed 7 months ago

trimpim commented 8 months ago

When I try to run our waasmedge project with either 23.04 or 23.10 I get the following error:

[init -> wasmedge -> wasmedge] Error: LD: symbol not found: '__gcc_personality_v0'
[init -> wasmedge -> wasmedge] Error: Uncaught exception of type 'Linker::Not_found'
[init -> wasmedge -> wasmedge] Warning: abort called - thread: ep

This uses:

This might be related to #66 and the errors I encountered in https://github.com/genodelabs/genode-world/issues/342

jschlatow commented 8 months ago

I'm no expert on linking either, yet, since __gcc_personality_v0 is part of ld.lib.so shouldn't it suffice to add the whole-archive thing here: https://github.com/genodelabs/goa/blob/2ec360cad0930998a03541d62e1c0a24a1559f0c/share/goa/lib/flags.tcl#L92

@ssumpf What's your take on this?

chelmuth commented 8 months ago

Each time the silver bullet whole archive is mentioned it feels worse to swallow that pill because I did not get the rationale yet.

trimpim commented 8 months ago

@jschlatow f256f0124d2e761c53dfe0a7970f95b5b0f89a76 also fixes the problem for me.

This hopefully will not break #60. The libraries I get are < 2MB.

ssumpf commented 8 months ago

Each time the silver bullet whole archive is mentioned it feels worse to swallow that pill because I did not get the rationale yet.

The rational is everything that is .global hidden in a library will get .local after linking, and thus, inaccessible for the dynamic linker. When one, for example, links libgcc.a without whole archive against a shared library, the static linker will find libgcc symbols used by the shared library and create jump slot relocations for these symbols. Hence when the library actually calls these symbols the dynamic linker will not find them.

ssumpf commented 8 months ago

@trimpim, @jschlatow: I will have a look into this. @trimpim: Can I reproduce your scenario using https://github.com/trimpim/wasmedge-genode ? There doesn't seem to be a pkg.

trimpim commented 8 months ago

@ssumpf I just have pushed the branch test-231110. This contains the pkg and a small README.md. With this you should be able to reproduce it.

To simplify your live, I suggest you also take the fix form #67 if you are using make 4.4.

ssumpf commented 8 months ago

@ssumpf I just have pushed the branch test-231110. This contains the pkg and a small README.md. With this you should be able to reproduce it.

To simplify your live, I suggest you also take the fix form #67 if you are using make 4.4.

@trimpim: Thanks, where do I find the llvm API that's set in used_apis?

ssumpf commented 8 months ago

@trimpim: Nevermind found it.

ssumpf commented 8 months ago

@trimpim: You need to change the 08-use_gcc_eh.patch of wasmedge from

   target_link_libraries(wasmedge_shared
     PRIVATE
     wasmedgeCAPI
+    # https://forums.developer.nvidia.com/t/undefined-reference-to-gcc-personality-v0/131127/3
+    gcc_eh
   )

to

   target_link_libraries(wasmedge_shared
     PRIVATE
     wasmedgeCAPI
+    # https://forums.developer.nvidia.com/t/undefined-reference-to-gcc-personality-v0/131127/3
+    -Wl,--whole-archive -Wl,-lgcc_eh -Wl,--no-whole-archive
   )

because __gcc_personality_v0 is part of libgcc_eh.a and not our dynamic linker which provides __gxx_personality_v0 only. You need to do the whole archive thing because the symbol is global .hidden, as described for libgcc above.

In the meantime I will try to clean up the libgcc and whole-archive chaos a little and look that things do not break for you.

ssumpf commented 8 months ago

@jschlatow: The issue for wasmedge can be fixed by adjusting a patch in the project. Otherwise, I tried to clean up the libgcc.a and whole-archive problem with commit feea13c.

chelmuth commented 7 months ago

because __gcc_personality_v0 is part of libgcc_eh.a and not our dynamic linker which provides __gxx_personality_v0 only. You need to do the whole archive thing because the symbol is global .hidden, as described for libgcc above.

As we use libgcc_eh (and libsupc++) in a creative way in cxx.mk it is literally part of the linker but suffers from the global-hidden condition too (if I understand correctly). Would it make sense to review symbols/ld and the linking of ld.lib.so to provide more symbols of the runtime that are currently missing to solve the issues we address here?

trimpim commented 7 months ago

@ssumpf thanks for the fix.

With this change and your patch build and run of wasmedge works for me.

ssumpf commented 7 months ago

@jschlatow: ce9b91b tries to improve upon f219ab3 by removing the unnecessary detection of libgcc from the qmake support and move libgcc to ldlib_common for all builds.

jschlatow commented 7 months ago

@ssumpf Unfortunately, ce9b91b breaks _examples/hellorust.

nfeske commented 7 months ago

Autoconf apparently also suffers. (experienced while attempting to port gforth on Linux/ARM64 on my MNT-Reform)

With the common, the basic compile test fails because all symbols of libgcc end up in the binary twice. This is probably because ldlibs_common is passed to configure as both LDLIBS and LIBS. When just specifying -lgcc, this is no problem because one lib can appear any number of times using the -l argument w/o causing multiple symbol definitions. But the whole-archive option seems to force the linker to squeeze all symbols of the lib into the binary. If specified twice, the symbols are added twice, ending up at an "double defined symbols" error.

I sense that the wrapping of -lgcc in a whole-archive block is not what we generally want.

nfeske commented 7 months ago

@ssumpf I also noticed that you removed the -nostdlib option. This is not good because without this option, a bunch of compiler heuristics kick in, which we don't want.

ssumpf commented 7 months ago

@ssumpf I also noticed that you removed the -nostdlib option. This is not good because without this option, a bunch of compiler heuristics kick in, which we don't want.

As far as I understand it, this is covered by -nostartfiles -nodefaultlibs -static-libgcc in ldlibs_common we could change that to -nostdlib and make it the same for everyone.

ssumpf commented 7 months ago

Autoconf apparently also suffers. (experienced while attempting to port gforth on Linux/ARM64 on my MNT-Reform)

With the common, the basic compile test fails because all symbols of libgcc end up in the binary twice. This is probably because ldlibs_common is passed to configure as both LDLIBS and LIBS. When just specifying -lgcc, this is no problem because one lib can appear any number of times using the -l argument w/o causing multiple symbol definitions. But the whole-archive option seems to force the linker to squeeze all symbols of the lib into the binary. If specified twice, the symbols are added twice, ending up at an "double defined symbols" error.

I sense that the wrapping of -lgcc in a whole-archive block is not what we generally want.

Okay, this one is new to me. In this case we want -lgcc for all binaries and the whole-archive for shared libraries only.

nfeske commented 7 months ago

As far as I understand it, this is covered by -nostartfiles -nodefaultlibs -static-libgcc

That's true - at least that was the rationale of 9dcadf757bc5f680457dc1c67ea1458cbee884d8. It seems that I missed adapting qmake.tcl in this respect. So it's good to remove this option. Could you do this in a separate commit?

ssumpf commented 7 months ago

I have added 39755ba to remove -nostdlib from qmake.tcl and made adjustments to use ldlibs_common for Qt5 apps as well (9f94761). With this all the tests (including Rust), the Linphone-SDK, (with a minor build-system check tweak for arm_v8a), my Qt5 scenarios, and wasmedge are working for me.

ssumpf commented 7 months ago

P.S. This also resolves the hello_make static constructor problem.

chelmuth commented 7 months ago

As we are again orbiting around whole-archive I took yesterday afternoon to get an idea of the actual situation and how several statements of the past fit into this picture.

  1. GCC's -static-libgcc is ineffective in our configuration as -nodefaultlibs disables the desired libgcc magic. From the manpage: Only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored.
  2. The size of libgcc is significant. In examples/cmake_library. libforty_two.lib.so increases from 14928 to 347856 bytes with --whole-archive. This overhead applies to all binaries incl. shared libs.
  3. I could not find any local or hidden wizardry in the shared object link. From my investigation, it's just the ancient plain rule of linker command lines that applies here: missing symbols are resolved from the remainder of the arguments to the right (or inside --start-group .. --end-group). So, if -lgcc is forced to the end of the linker command line, it just works.

After the investigation I patched examples for the test ecf97e0b1ea14f2b2212b59c069a9d5cacb7d9db and sketched solutions for common flags e3f855749541d520afd526001cef0e861351a64f as well as cmake 459308b5413a53a45f4b04fdb5a7576da9afecd8. My question is now: Can we walk this road and, thus, wipe some myths and legends associated to this topic?

chelmuth commented 7 months ago

P.S. This also resolves the hello_make static constructor problem.

Could you please tell us the nature of the problem? Which constructor was not called?

Also, 9f947611cad764ceb791ccf6283ede727e35fe53 changes flags.tcl en-passant but the commit message suggests changes (and effects) to qmake only, while all build systems are affected.

ssumpf commented 7 months ago

As we are again orbiting around whole-archive I took yesterday afternoon to get an idea of the actual situation and how several statements of the past fit into this picture.

1. GCC's `-static-libgcc` is ineffective in our configuration as `-nodefaultlibs` disables the desired libgcc magic. From the manpage: _Only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored._

2. The size of libgcc is significant. In _examples/cmake_library_. _libforty_two.lib.so_ increases from 14928 to 347856 bytes with `--whole-archive`. This overhead applies to _all_ binaries incl. shared libs.

3. I could not find any _local_ or _hidden_ wizardry in the shared object link. From my investigation, it's just the ancient plain rule of linker command lines that applies here: missing symbols are resolved from the remainder of the arguments to the right (or inside --start-group .. --end-group). So, if `-lgcc` is forced to the end of the linker command line, it just works.

After the investigation I patched examples for the test ecf97e0 and sketched solutions for common flags e3f8557 as well as cmake 459308b. My question is now: Can we walk this road and, thus, wipe some myths and legends associated to this topic?

@chelmuth: I have tried your branch and it seems to work well in most cases. qt5_quicktest does not link for arm_v8a (undefined reference to __aarch64_ldadd4_acq_rel), linphone-simple produces the same undefined reference in the library for the libservicecontrolplugin.lib.so. This can quickly be reproduced using my goa-projects (https://github.com/ssumpf/goa-projects - master branch). Note this has always been a problem on arm_v8a only, I never saw it on x86.

ssumpf commented 7 months ago

@chelmuth: I will try to get -lgcc to the end of the linker command line for qmake next.

ssumpf commented 7 months ago

@chelmuth: Okay, -lgcc at the end of the linking command by hand works like a charm! Learned some ancient knowledge today ;) The only question that remains is how to convince the Qt5 build system to do so? @cproc: Do you have any suggestions?

cproc commented 7 months ago

It looks like GENODE_QMAKE_LIBS needs to be set as well like in https://github.com/genodelabs/genode/blob/master/repos/libports/lib/import/import-qt5_qmake.mk.

ssumpf commented 7 months ago

It looks like GENODE_QMAKE_LIBS needs to be set as well like in https://github.com/genodelabs/genode/blob/master/repos/libports/lib/import/import-qt5_qmake.mk.

@cproc: Yes this does the trick :+1:

ssumpf commented 7 months ago

@jschlatow: The commits above (my staging branch) are hopefully the last ones regarding this issue. Thanks to our combined knowledge I am pretty happy with this solution and everything works as expected.

jschlatow commented 7 months ago

Thanks for the collaborative effort. I'm also happy with the result.

I merged the commits and force-pushed to staging to eliminate commit 81ced02.