Open comex opened 1 year ago
Ok I looked into this and the way that I think this needs to happen here we need to handle the specific swift specific string pointer encoding via a workflow. This would effectively rewrite the referenced function to be:
100003f54 08080090 adrp x8, 0x100103000
100003f58 08c13d91 add x8, x8, #0xf70 {data_100103f70, "this is a long string so it doesn't get small-string optimized"}
100003f5c 1f2003d5 nop
100003f60 e10308aa mov x1, x8 {data_100103f70, "this is a long string so it doesn't get small-string optimized"}
100003f64 c00780d2 mov x0, #0x3e
100003f68 0000faf2 movk x0, #0xd000, lsl #0x30 {0xd00000000000003e}
100003f6c c0035fd6 ret
The workflow would do this but on LLIL or MLIL instead of operating on the assembly directly
Native strings have tail-allocated storage, which begins at an offset of
nativeBias
from the storage object's address. String literals, which reside in the constant section, are encoded as their start address minusnativeBias
, unifying code paths for both literals ("immortal native") and native strings. Native Strings are always managed by the Swift runtime.
b61: isNativelyStored. set for native stored strings
largeAddressBits
holds an instance of_StringStorage
.- I.e. the start of the code units is at the stored address +
nativeBias
internal static var nativeBias: UInt {
#if _pointerBitWidth(_64)
return 32
#elseif _pointerBitWidth(_32)
return 20
#else
#error("Unknown platform")
#endif
}
/// valueToBridgeObject(x) === (x << _swift_abi_ObjCReservedLowBits) |
/// _swift_BridgeObject_TaggedPointerBits
To add to the conversation. "0x8000000000000000" in the most significant bits of the bridge object that identify these large immortal strings are defined here:
// Discriminator for large, immortal, swift-native strings
@inlinable @inline(__always)
internal static func largeImmortal() -> UInt64 {
#if os(Android) && arch(arm64)
return 0x0080_0000_0000_0000
#else
return 0x8000_0000_0000_0000
#endif
}
An example from the decompilation of a sample I looked at recently:
100002fec // /Users/Shared/1.zip
100002fec URL.init(fileURLWithPath:)(0xd000000000000013, 0x8000000100003cc0)
I wrote a quick and small script starting with this to parse these, add the bias (+0x20), pass address to bv.get_string_at(), and write a comment at the caller but having this built into the workflow would be great.
def find_calls(i):
match i:
case HighLevelILCall():
return i
Also small string parsing would be nice as well. Small Immortal strings are passed like this:
1000032a4 String.append(_:)(0x65676465682e, 0xe600000000000000)
1000032bc String.append(_:)(0x6376676f68, 0xe500000000000000)
These are ascii due to the bridge object starting with 0xe, which is '0b1110' and matches this chart:
If the string is more than 8 bytes, the remaining hex values bleed into the bridge object.
Version and Platform (required):
Bug Description: Swift (at least on arm64 macOS) has an odd way of referring to string literals. Here is the original assembly produced by
swiftc
:Or as disassembled by Binary Ninja (the string ended up at 0x100103f70):
The problem is that Binary Ninja doesn't create an xref to 0x100103f70, presumably because it emulates the whole sequence of operations and ends up with 0x8000000100103f50.
Using the decompiler for xrefs is often helpful, but here it's counterproductive compared to a more naive approach of looking for
adrp
/add
pairs.Ideally, Binary Ninja would be able to identify these references.
Steps To Reproduce:
Disassemble this test binary and go to the
__cstring
section. Note that there is no reference to the string.This corresponds to the following source code:
Note that I had to add a bunch of padding between the code and the string. Without this, the linker will replace the
adrp
/add
pair withadr
/nop
, and Binary Ninja does identify the reference in that case.Additional Information: There is nothing meaningful located 0x20 bytes before the string (the string is at the very start of the section), so the subtraction of 0x20 is just part of some pointer encoding scheme, along with the OR of 0x8000000000000000. Not sure about the details of this scheme.