apple / swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors
https://apple.github.io/swift-distributed-actors/
Apache License 2.0
587 stars 55 forks source link

EXC_BAD_ACCESS in OpLogDistributedReceptionist #1140

Closed akbashev closed 6 months ago

akbashev commented 11 months ago

Description When playing with nodes and dynamically adding and removing them distributed system crashes with EXC_BAD_ACCESS in OpLogDistributedReceptionist (see screenshot and backtrace).

_Not sure if it's helpful ore not, but if you fix logic by removing __secretlyKnownToBeLocal and assuming it's not local—it will stop crashing, so assume concurrency issue, though not sure if Swift itself or logic._

Steps to reproduce Run any simple project with two nodes (e.g. https://github.com/apple/swift-distributed-actors/pull/1139) and simultaneously initialise, join them and create actors without waiting to be up (not the best logic ofc, but at least a way to simulate a crash). For me it crashes almost every time (if not just try close/run several times more). Think key point is to start firing multiple dead letters.

Environment macOS 14, Xcode 15.0.1 (15A507)

Backtrace

* thread #80, queue = 'com.apple.root.default-qos.cooperative', stop reason = EXC_BAD_ACCESS (code=1, address=0x8000000000000008)
    frame #0: 0x000000019032290c libswiftCore.dylib`swift_isUniquelyReferenced_nonNull_native
    frame #1: 0x000000018ffe52c8 libswiftCore.dylib`Swift.Dictionary._Variant.removeValue(forKey: τ_0_0) -> Swift.Optional<τ_0_1> + 660
  * frame #2: 0x0000000100b68594 Server`$defer #1 (self=0x0000000145205320, timerTaskKey=-2452856007130675400) in closure #1 in OpLogDistributedReceptionist.ensureDelayedListingFlush(of:) at OperationLogDistributedReceptionist.swift:488:62
    frame #3: 0x0000000100b682f4 Server`closure #1 in OpLogDistributedReceptionist.ensureDelayedListingFlush(self=<no summary available>, timerTaskKey=-2452856007130675400, flushDelay=(_low = 250000000000000000, _high = 0), key=(id = "persistences", guestType = Any)) at OperationLogDistributedReceptionist.swift:494:38
    frame #4: 0x0000000100b6f628 Server`partial apply for closure #1 in OpLogDistributedReceptionist.ensureDelayedListingFlush(of:) at <compiler-generated>:0

Screenshot

Screenshot 2023-10-20 at 14 20 20
ktoso commented 11 months ago

Hmmm interesting, thanks for the reproducer -- I'll give this a look as soon as I can!

SeanXuCn commented 10 months ago

Perhaps I'm also experiencing this problem. image

*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSTaggedPointerString countByEnumeratingWithState:objects:count:]: unrecognized selector sent to instance 0x8000000000000000' *** First throw call stack: ( 0 CoreFoundation 0x00000001899f2800 __exceptionPreprocess + 176 1 libobjc.A.dylib 0x00000001894e9eb4 objc_exception_throw + 60 2 CoreFoundation 0x0000000189aa43bc -[NSObject(NSObject) __retain_OA] + 0 3 CoreFoundation 0x000000018995ca84 ___forwarding___ + 1572 4 CoreFoundation 0x000000018995c3a0 _CF_forwarding_prep_0 + 96 5 libswiftCore.dylib 0x00000001992869ec $ss17__CocoaDictionaryV8IteratorC7nextKeyyXlSgyFTm + 84 6 libswiftCore.dylib 0x00000001991bf330 $sSD4KeysV8IteratorV4nextxSgyF + 248 7 xxxxxxxx 0x0000000100668510 $s18DistributedCluster05OpLogA12ReceptionistC15periodicAckTickyyF + 940 8 xxxxxxxx 0x0000000100667ef4 $s18DistributedCluster05OpLogA12ReceptionistC8settings6systemAcA0E8SettingsV_AA0B6SystemCtYacfcyyYaYbcfU0_TY2_ + 300 9 xxxxxxxx 0x0000000100668105 $s18DistributedCluster05OpLogA12ReceptionistC8settings6systemAcA0E8SettingsV_AA0B6SystemCtYacfcyyYaYbcfU0_TATQ0_ + 1 10 libswift_Concurrency.dylib 0x0000000228aa1c7d _ZL23completeTaskWithClosurePN5swift12AsyncContextEPNS_10SwiftErrorE + 1 ) libc++abi: terminating due to uncaught exception of type NSException