dropbox / djinni

A tool for generating cross-language type declarations and interface bindings.
Apache License 2.0
2.88k stars 488 forks source link

suspicious point of dead lock #404

Closed ruffianhy closed 5 years ago

ruffianhy commented 5 years ago

support-lib/proxy_cache_impl.hpp: OwningProxyPointer get(const std::type_index & tag, const OwningImplPointer & impl, AllocatorFunction * alloc) { std::unique_lock lock(m_mutex); UnowningImplPointer ptr = get_unowning(impl); auto existing_proxy_iter = m_mapping.find({tag, ptr}); if (existing_proxy_iter != m_mapping.end()) { OwningProxyPointer existing_proxy = upgrade_weak(existing_proxy_iter->second); if (existing_proxy) { return existing_proxy; } else { // The weak reference is expired, so prune it from the map eagerly. m_mapping.erase(existing_proxy_iter); } }

    auto alloc_result = alloc(impl);
    m_mapping.emplace(Key{tag, alloc_result.second}, alloc_result.first);
    return alloc_result.first;
}

void remove(const std::type_index & tag, const UnowningImplPointer & impl_unowning) { std::unique_lock lock(m_mutex); auto it = m_mapping.find({tag, impl_unowning}); if (it != m_mapping.end()) { // The entry in the map should already be expired: this is called from Handle's // destructor, so the proxy must already be gone. However, remove() does not // happen atomically with the proxy object becoming weakly reachable. It's // possible that during the window between when the weak-ref holding this proxy // expires and when we enter remove() and take m_mutex, another thread could have // created a new proxy for the same original object and added it to the map. In // that case, it->second will contain a live pointer to a different proxy object, // not an expired weak pointer to the Handle currently being destructed. We only // remove the map entry if its pointer is already expired. if (is_expired(it->second)) { m_mapping.erase(it); } } }

if "m_mapping.erase(existing_proxy_iter);"(in "get()") cause a destructor,“cleanup (in ObjcProxyCachetraits)” func will call the "remove" func, this may lead to a deadlock ???

since the "get" and "remove" use the same lock "std::unique_lockstd::mutex lock(m_mutex)"

ruffianhy commented 5 years ago

image

ruffianhy commented 5 years ago

i don‘t sure it is caused by my code or not . Wait for your help . thank you.

artwyman commented 5 years ago

That certainly does look suspicious. That stack trace suggests the proxy cache is trying to acquire its own lock reentrantly. It's unclear to me exactly how that's happening, though.

@ruffianhy before we go any further can you confirm that you're using the support-lib code from the latest Djinni master, so we're looking at the same code? Having done that, it would be most useful if you could create a simple test case which demonstrates the deadlock.

Failing that, can you say what the situation is where this happens? Can you get a more detailed stack trace with line numbers? In particular I'd like to know where the shared_ptr destructor in item 15 is getting triggered and why.

Skimming through the code of Pimpl::get() it seems to me the only time you should have a shared_ptr which owns a proxy (OwningProxyPointer) which is non-null is when you're going to return it, so I'm not sure how the last reference should ever be released inside of that function.

ruffianhy commented 5 years ago

image Just a update, i can reproduce it by annotating codes like above. The destruct is caused by "OwningProxyPointer existing_proxy" release. i will keep investigating the root cause.

artwyman commented 5 years ago

Is the end of that if the line number where the deadlock is happening in unmodified code, or is it different? The modification you made would definitely force a situation where you have a non-null owning pointer not being returned, but doesn't explain how that situation occurs on its own.

ruffianhy commented 5 years ago

yes, the deadlock happen in the same way, "get()" call "remove()". The "existing_proxy" destructed after "if(){ }". I just try to reproduce it.

artwyman commented 5 years ago

That's very puzzling then, unless I'm missing something. If existing_proxy is non-null, it's being returned, so it won't be releasing the last reference, and thus not deleting. If it's null, then it has nothing to delete. So I'm having trouble seeing how the could be a deletion occurring at that line in order to trigger the ~ObjcProxy destructor. There must be some detail I'm missing here.

ruffianhy commented 5 years ago

yes, u r right. still need more research.

ruffianhy commented 5 years ago

image image

ruffianhy commented 5 years ago

This time we use the latest version. And meet this dead lock without any change in codes.

artwyman commented 5 years ago

I don't think I understand the latest iOS and Xcode behavior enough to understand those screenshots. What are the "Enqueued from" blocks? Is the event handling system actually running unrelated events queued from other threads inside of an ObjC object's destructor? That seems super dangerous, and definitely the kind of thing that could cause reentrancy and deadlocks.