animetosho / rapidyenc

SIMD accelerated yEnc en/decode C library
5 stars 0 forks source link

Mac build issues #1

Closed mnightingale closed 10 months ago

mnightingale commented 10 months ago

I'm having a couple of issues on macOS Sonoma 14.0 on a M2 that hoping you can help with.

First what seems a minor issue with:

https://github.com/animetosho/rapidyenc/blob/78d71c448b86729c21fc116c8d1b51920f8230b7/CMakeLists.txt#L113

In file included from /Users/mnightingale/personal/workspace/rapidyenc/rapidyenc/src/platform.cc:11:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/sys/sysctl.h:83:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk/usr/include/sys/ucred.h:101:2: error: unknown type name 'u_int'
        u_int   cr_version;             /* structure layout version */
...
(snip same for a few other files)

Removing the line seems to resolve it, at least it builds but I don't know if there are any implications.


Second issue with Xcode 15

Since updating to Xcode 15 (Apple clang version 15.0.0 (clang-1500.0.40.1)) I crash when trying to call rapidyenc_decode_incremental from Go.

SIGILL: illegal instruction
PC=0x103cc5000 m=0 sigcode=2
signal arrived during cgo execution
instruction bytes: 0x53 0xd9 0x3 0x4f 0x34 0xe7 0x1 0x4f 0x95 0xe4 0x0 0x4f 0x16 0xe4 0x2 0x4f

I'm not sure but I don't think those instructions are arm code?

Taking a look at the dylib in ghidra:

Error | Bad Instruction | Unable to resolve constructor at 00005000 (flow from 00004ffc) | 00005000 |   | ?? 53h    S

There is a chunk from 0x5000 - 0x56e0 which is hasn't decompiled.

Looking at the disassembled code preceeding it which is function do_decode_simd<>(uchar **param_1,uchar **param_2,ulong param_3,YencDecoderState *param_4)

image

If I comment out the neon64 decode it works correctly, no bad instructions.

#if(IS_ARM64)
#   set(DECODER_NEON_FILE decoder_neon64.cc)
#else()
    set(DECODER_NEON_FILE decoder_neon.cc)
#endif()

At this point I'm kind of lost, I thought maybe a missing compiler flag? or maybe something has changed with Apples new compiler.

Thanks, Mike

animetosho commented 10 months ago

Thanks for reporting.

Implemented a fix for the first issue.

The second issue is weird. Could you get the call stack when it raises the SIGILL?
Just shooting in the dark, could you try replacing _vld1q_u8_x4 on this line with vld4q_u8, and seeing if it still SIGILLs in the same place?

mnightingale commented 10 months ago

I've had another look at this, building the shared library with -DCMAKE_CXX_FLAGS_RELEASE=-O1 worked so I determined it must be a compiler/optimisation error. -O2 some clang error with linker -O3 I think is the default, compiles but crashes / bad instructions

Since I started having this problem 'Command Line Tools for Xcode 15.1 beta' was released and it's working again :)

Thanks for your help.

animetosho commented 10 months ago

Thanks for the investigation.
Is the original Xcode 15 a beta? It's interesting to note that it's using Clang 15.0.0 - I've understood Clang's x.0.0 designation to signify in-dev code (first stable release is typically x.0.1). I'm generally willing to make workarounds for "stable" release compiler bugs.

mnightingale commented 10 months ago

Xcode 15 is a stable release and the problem remained on 15.0.1. Unfortunately I'd updated macOS so 15 is the minimum I can use, luckily I had a backup of the working shared library which I'd been using for the last few weeks.

For my use in Go I'm in the process of switching to a use a static library, which doesn't appear to have the problem when compiled with 15.0.1

Shared library build of 9c6f0b9 built with Xcode 15.0.1 and Xcode 15.1 Beta librapidyenc.zip

Built with:

rm -rf rapidyenc/build
cmake -S rapidyenc -B rapidyenc/build
cmake --build rapidyenc/build --target rapidyenc_shared -j8

Maybe you'll find the difference interesting but I wouldn't worry about a workaround for it.

animetosho commented 10 months ago

That comparison was interesting. Initially thought it'd be useless, but it seems to be the same compiler in both, so generates almost identical code:

-    5000:  4f03d953    .inst   0x4f03d953 ; undefined
+    5000:  4f00e553    movi    v19.16b, #0xa
-    5400:  3dc00008    ldr q8, [x0]
+    5400:  3dc3d808    ldr q8, [x0, #3936]

Given that the address is always a multiple of 1KB, and only some bits were changed, I'm thinking this is more a linker bug.
So yeah, not much I can do about it unfortunately.

Thanks for the help!