gnustep / libobjc2

Objective-C runtime library intended for use with Clang.
http://www.gnustep.org/
MIT License
426 stars 116 forks source link

PowerPC {32, 64}-bit Block Trampolines #272

Closed hmelder closed 4 months ago

hmelder commented 5 months ago

Status

Besides UnexpectedException failing (just like on AArch64), all unit tests are passing. However, I would like to incorporate a proper fix for the hard-coded page sizes (See #271) as marking the whole page RWX is not really great... (edit: This was due to mprotect failing to set PROT_EXEC on an unaligned page as ppc64el has a page size of 64k on Debian) This is why this PR is still marked as a draft.

It would also be good to have a CI workflow for PowerPC.

PowerPC 32-bit

System (QEMU):

Clang:

root@debian:~# clang --version
Debian clang version 16.0.6 (19)
Target: powerpc-unknown-linux-gnu

I needed to link objc to libatomic. Ideas on how to check this in CMake are welcome!

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4e080d8..d2a3d3a 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -245,6 +245,7 @@ if (WIN32 AND NOT MINGW)
 endif()

 target_link_libraries(objc tsl::robin_map)
+target_link_libraries(objc atomic)

 set_target_properties(objc PROPERTIES
        LINKER_LANGUAGE C

Results:

100% tests passed, 0 tests failed out of 190

Total Test time (real) = 213.12 sec

The following tests did not run:
    115 - objc_msgSend (Skipped)
    116 - objc_msgSend_optimised (Skipped)
    117 - objc_msgSend_legacy (Skipped)
    118 - objc_msgSend_legacy_optimised (Skipped)
    151 - UnexpectedException (Skipped)
    152 - UnexpectedException_optimised (Skipped)
    153 - UnexpectedException_legacy (Skipped)
    154 - UnexpectedException_legacy_optimised (Skipped)
    171 - FastPathAlloc (Skipped)
    172 - FastPathAlloc_optimised (Skipped)

POWER9 (PowerPC 64-bit Little-Endian)

System (QEMU):

Clang:

root@debian:~/libobjc2# clang --version
Debian clang version 16.0.6 (19)
Target: powerpc64le-unknown-linux-gnu

Results:

100% tests passed, 0 tests failed out of 190

Total Test time (real) = 116.38 sec

The following tests did not run:
    115 - objc_msgSend (Skipped)
    116 - objc_msgSend_optimised (Skipped)
    117 - objc_msgSend_legacy (Skipped)
    118 - objc_msgSend_legacy_optimised (Skipped)
    151 - UnexpectedException (Skipped)
    152 - UnexpectedException_optimised (Skipped)
    153 - UnexpectedException_legacy (Skipped)
    154 - UnexpectedException_legacy_optimised (Skipped)
    171 - FastPathAlloc (Skipped)
    172 - FastPathAlloc_optimised (Skipped)
hmelder commented 5 months ago

I cannot reproduce the build error from the powerpc-linux-gnu cross builder. It seems like the GitHub Ubuntu image has some additional packages in /usr/lib32 which are incorrectly used instead.

The powerpc64le-linux-gnu cross builder works just fine.

hmelder commented 5 months ago

@rmottola do you want to try the changes out on real hardware, or perhaps have a look at the ASM?

davidchisnall commented 5 months ago

Take a look at what we do in snmalloc for the hacks required to make PowerPC run correctly. The Debian / Ubuntu binfmt packages set it up with the wrong page size. It's very annoying.

hmelder commented 5 months ago

Take a look at what we do in snmalloc for the hacks required to make PowerPC run correctly.

Alright. Thank you :)

The Debian / Ubuntu binfmt packages set it up with the wrong page size. It's very annoying.

The weird thing is that it works on a fresh 22.04 aarch64 rootfs with powerpc cross.

hmelder commented 5 months ago

Take a look at what we do in snmalloc for the hacks required to make PowerPC run correctly.

It seems that snalloc only has a powerpc64le configuration in the workflow, but the problem lies with PowerPC (32-bit, Big-Endian). We can skip the integration for now, as it is a legacy architecture.

rmottola commented 5 months ago

@rmottola do you want to try the changes out on real hardware, or perhaps have a look at the ASM?

Yes, I can test it on real hardware. Setting up thinks on my iBook G4 as a first test. Will report.

I can use:

Debian clang version 16.0.6 (19)
Target: powerpc-unknown-linux-gnu
Thread model: posix
processor       : 0
cpu             : 7447A, altivec supported
rmottola commented 5 months ago

Built completes of the branch ppc-block-trampoline.

I get a couple of warnings /home/multix/code/libobjc2/sarray2.h:55:8: warning: variable length array folded to constant array as an extension [-Wgnu-folding-constant]

`/home/multix/code/libobjc2/objc_msgSend.S:16:2: warning: objc_msgSend() not implemented for your architecture [-W#warnings]

warning objc_msgSend() not implemented for your architecture

`

rmottola commented 5 months ago

How should I run tests?

multix@iBookG4-14:~/code/libobjc2/Build$ make test
make: *** No rule to make target 'test'.  Stop
davidchisnall commented 5 months ago

Tests aren't built by default. Run ccmake and toggle the option. Then ctest will run them.

hmelder commented 5 months ago

I get a couple of warnings /home/multix/code/libobjc2/objc_msgSend.S:16:2: warning: objc_msgSend() not implemented for your architecture [-> W#warnings] #warning objc_msgSend() not implemented for your architecture

This is expected. We are still using the two-stage message dispatch mechanism, as I have not yet implemented objc_msgSend.powerpc.S.

How should I run tests?

You need to enable them with the TESTS flag. Building in Debug mode can be useful as well :)

cmake -DTESTS=ON -DCMAKE_BUILD_TYPE=Debug <OTHER_CMAKE_ARGS> -B build

Thank you for trying this out!

rmottola commented 5 months ago

I reconfigured TESTS=ON with ccmake as David suggested.

100% tests passed, 0 tests failed out of 190

Total Test time (real) = 154.75 sec

The following tests did not run:
        115 - objc_msgSend (Skipped)
        116 - objc_msgSend_optimised (Skipped)
        117 - objc_msgSend_legacy (Skipped)
        118 - objc_msgSend_legacy_optimised (Skipped)
        151 - UnexpectedException (Skipped)
        152 - UnexpectedException_optimised (Skipped)
        153 - UnexpectedException_legacy (Skipped)
        154 - UnexpectedException_legacy_optimised (Skipped)
        171 - FastPathAlloc (Skipped)
        172 - FastPathAlloc_optimised (Skipped)
hmelder commented 5 months ago

Thanks for testing Riccardo.

So the only thing left is to check whether the binfmt bug for ppc64el still exists.

davidchisnall commented 5 months ago

Needing libatomic is a bit worrying. I think that should be needed on 32-bit PowerPC if we do 64-bit atomics, but we shouldn’t be. Can you see where the call is in the generated assembly?

hmelder commented 5 months ago

Can you see where the call is in the generated assembly?

[389/431] Linking C executable Test/CXXExceptions
FAILED: Test/CXXExceptions
: && /usr/bin/clang-17 --target=powerpc-linux-gnu -O3 -DNDEBUG -L/usr/lib/llvm-17/lib/ -fuse-ld=lld-17 -Wl,--dynamic-linker=/usr/powerpc-linux-gnu/lib/ld.so.1,-rpath,/usr/powerpc-linux-gnu/lib Test/CMakeFiles/test_runtime.dir/Test.m.o Test/CMakeFiles/CXXExceptions.dir/CXXException.m.o Test/CMakeFiles/CXXExceptions.dir/CXXException.cc.o -o Test/CXXExceptions  -Wl,-rpath,/Users/hugo/Source/github.com/gnustep/libobjc2/build-powerpc  libobjc.so.4.6  -lstdc++  -lm && :
ld.lld-17: error: undefined reference due to --no-allow-shlib-undefined: __atomic_fetch_add_8
>> referenced by libobjc.so.4.6

__atomic_fetch_add_8 is used in objc_update_dtable_for_new_superclass:

llvm-objcdump snippet ``` 00023eec : 23eec: 7c 08 02 a6 mflr 0 23ef0: 94 21 ff e0 stwu 1, -32(1) 23ef4: 93 c1 00 18 stw 30, 24(1) 23ef8: 90 01 00 24 stw 0, 36(1) 23efc: 93 41 00 08 stw 26, 8(1) 23f00: 93 61 00 0c stw 27, 12(1) 23f04: 93 81 00 10 stw 28, 16(1) 23f08: 7c 9c 23 78 mr 28, 4 23f0c: 93 a1 00 14 stw 29, 20(1) 23f10: 48 00 00 05 bl 0x23f14 23f14: 7c 7d 1b 78 mr 29, 3 23f18: 7f c8 02 a6 mflr 30 23f1c: 80 7e ff d4 lwz 3, -44(30) 23f20: 7f c3 f2 14 add 30, 3, 30 23f24: 83 5d 00 20 lwz 26, 32(29) 23f28: 80 7e 80 00 lwz 3, -32768(30) 23f2c: 80 63 00 00 lwz 3, 0(3) 23f30: 7c 03 d0 40 cmplw 3, 26 23f34: 41 82 00 0c bt 2, 0x23f40 23f38: 7f 5b d3 78 mr 27, 26 23f3c: 48 00 00 88 b 0x23fc4 23f40: 80 7e 80 24 lwz 3, -32732(30) 23f44: 48 01 83 19 bl 0x3c25c <00000000.plt_pic32.pthread_mutex_lock> 23f48: 80 7e 80 00 lwz 3, -32768(30) 23f4c: 83 7d 00 20 lwz 27, 32(29) 23f50: 80 63 00 00 lwz 3, 0(3) 23f54: 7c 03 d8 40 cmplw 3, 27 23f58: 41 82 00 10 bt 2, 0x23f68 23f5c: 80 7e 80 24 lwz 3, -32732(30) 23f60: 48 01 83 0d bl 0x3c26c <00000000.plt_pic32.pthread_mutex_unlock> 23f64: 48 00 00 60 b 0x23fc4 23f68: 80 7e 80 2c lwz 3, -32724(30) 23f6c: 80 63 00 00 lwz 3, 0(3) 23f70: 28 03 00 00 cmplwi 3, 0 23f74: 41 82 00 1c bt 2, 0x23f90 23f78: 80 83 00 00 lwz 4, 0(3) 23f7c: 7c 04 e8 40 cmplw 4, 29 23f80: 41 82 00 18 bt 2, 0x23f98 23f84: 80 63 00 08 lwz 3, 8(3) 23f88: 28 03 00 00 cmplwi 3, 0 23f8c: 40 82 ff ec bf 2, 0x23f78 23f90: 7f 5b d3 78 mr 27, 26 23f94: 48 00 00 08 b 0x23f9c 23f98: 83 63 00 04 lwz 27, 4(3) 23f9c: 80 7e 80 24 lwz 3, -32732(30) 23fa0: 48 01 82 cd bl 0x3c26c <00000000.plt_pic32.pthread_mutex_unlock> 23fa4: 80 7e 80 00 lwz 3, -32768(30) 23fa8: 80 63 00 00 lwz 3, 0(3) 23fac: 7c 1b 18 40 cmplw 27, 3 23fb0: 41 82 00 14 bt 2, 0x23fc4 23fb4: 7f a3 eb 78 mr 3, 29 23fb8: 48 01 82 c5 bl 0x3c27c <00000000.plt_pic32.objc_sync_enter> 23fbc: 7f a3 eb 78 mr 3, 29 23fc0: 48 01 82 cd bl 0x3c28c <00000000.plt_pic32.objc_sync_exit> ...skipping... 23ff0: 38 c0 00 01 li 6, 1 23ff4: 38 e0 00 05 li 7, 5 23ff8: 48 01 82 d5 bl 0x3c2cc <00000000.plt_pic32.__atomic_fetch_add_8> 23ffc: 80 7e 80 30 lwz 3, -32720(30) 24000: 48 01 82 6d bl 0x3c26c <00000000.plt_pic32.pthread_mutex_unlock> 24004: 83 a1 00 14 lwz 29, 20(1) 24008: 83 81 00 10 lwz 28, 16(1) 2400c: 83 61 00 0c lwz 27, 12(1) 24010: 83 41 00 08 lwz 26, 8(1) 24014: 80 01 00 24 lwz 0, 36(1) 24018: 83 c1 00 18 lwz 30, 24(1) 2401c: 38 21 00 20 addi 1, 1, 32 24020: 7c 08 03 a6 mtlr 0 24024: 4e 80 00 20 blr 24028: 7c 7d 1b 78 mr 29, 3 2402c: 80 7e 80 30 lwz 3, -32720(30) 24030: 48 01 82 3d bl 0x3c26c <00000000.plt_pic32.pthread_mutex_unlock> 24034: 7f a3 eb 78 mr 3, 29 24038: 48 01 82 65 bl 0x3c29c <00000000.plt_pic32._Unwind_Resume> 2403c: 00 04 2b 70 ```
hmelder commented 5 months ago

objc_method_cache_version is defined as an _Atomic(uint64_t). Can we make this architecture dependent?

davidchisnall commented 5 months ago

Hmm, that’s needed for the safe caching, but it isn’t useful on platforms where 64-bit atomic reads are expensive. It needs to be 64-bit to avoid overflows. We can probably just not enable it on PowerPC32.

hmelder commented 5 months ago

Sorry my question was malformed.

It needs to be 64-bit to avoid overflows.

This answers it.

We can probably just not enable it on PowerPC32.

I am not familiar with the performance differences of non-native atomic reads, but is it that big of a deal? What would you use instead?

davidchisnall commented 5 months ago

I am not familiar with the performance differences of non-native atomic reads, but is it that big of a deal?

Non-native ones require acquiring a lock and then doing the read. This basically serialises them all and so will offset any potential benefit from caching.

What would you use instead?

Don't do lookup caching on PowerPC32. The compiler doesn't (yet?) do this automatically anyway.

hmelder commented 5 months ago

Don't do lookup caching on PowerPC32. The compiler doesn't (yet?) do this automatically anyway.

If I understand the lookup caching implementation correctly, the caller can opt in and cache a pointer to the current slot, checking whether the slot was invalidated before using it.

When you call objc_slot_lookup_version, the final parameter is used to return either the current value of objc_method_cache_version or 0 if the slot is uncacheable.

What about wrapping the atomic objc_method_cache_version operations in ifdefs (not including them on ppc32), and always write 0 to *version in objc_method_cache_version?

davidchisnall commented 5 months ago

What about wrapping the atomic objc_method_cache_version operations in ifdefs (not including them on ppc32), and always write 0 to *version in objc_method_cache_version?

Yup, I think the thing to do on PowerPC32 is to always write 0 to *version and to hide the declaration of objc_method_cache_version.

hmelder commented 5 months ago

And to hide the declaration of objc_method_cache_version.

But does the declaration of an _Atomic type pull in the libatomic library?

hmelder commented 5 months ago

I do not like conditionally hiding parts of the public API.

davidchisnall commented 5 months ago

Updates to it will pull in atomic things, we should simply remove it if we’re on a platform without 64-bit atomics.

hmelder commented 4 months ago

Caching is now disabled on ppc32. All tests are passing.

100% tests passed, 0 tests failed out of 190

Total Test time (real) = 201.64 sec

The following tests did not run:
    115 - objc_msgSend (Skipped)
    116 - objc_msgSend_optimised (Skipped)
    117 - objc_msgSend_legacy (Skipped)
    118 - objc_msgSend_legacy_optimised (Skipped)
    151 - UnexpectedException (Skipped)
    152 - UnexpectedException_optimised (Skipped)
    153 - UnexpectedException_legacy (Skipped)
    154 - UnexpectedException_legacy_optimised (Skipped)
    171 - FastPathAlloc (Skipped)
    172 - FastPathAlloc_optimised (Skipped)
root@debian:~/libobjc2/build-ppc# uname -a
Linux debian 6.6.8-powerpc-smp #1 SMP Debian 6.6.8-1 (2023-12-22) ppc GNU/Linux
davidchisnall commented 4 months ago

LGTM, please do some squashing before you merge!