Zhuinden / simple-stack

[ACTIVE] Simple Stack, a backstack library / navigation framework for simpler navigation and state management (for fragments, views, or whatevers).
Apache License 2.0
1.36k stars 76 forks source link

NPE Crashes when goBack is called on the Backstack #256

Closed amadrian closed 2 years ago

amadrian commented 2 years ago

We are noticing several Crashlytics reports of an NPE that occurs when goBack is called on the backstack. I am not sure how to reproduce this though. Here is the crash log

Fatal Exception: java.lang.NullPointerException: Attempt to read from field 'java.util.LinkedHashMap$LinkedEntry java.util.LinkedHashMap$LinkedEntry.nxt' on a null object reference
       at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:350)
       at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:366)
       at java.util.AbstractCollection.retainAll(AbstractCollection.java:318)
       at com.zhuinden.simplestack.ScopeManager.cleanupScopesBy(ScopeManager.java:545)
       at com.zhuinden.simplestack.Backstack$2.stateChangeCompleted(Backstack.java:180)
       at com.zhuinden.simplestack.NavigationCore.notifyCompletionListeners(NavigationCore.java:709)
       at com.zhuinden.simplestack.NavigationCore.completeStateChange(NavigationCore.java:662)
       at com.zhuinden.simplestack.NavigationCore.access$100(NavigationCore.java:39)
       at com.zhuinden.simplestack.NavigationCore$1.stateChangeComplete(NavigationCore.java:645)
       at com.zhuinden.simplestack.SimpleStateChanger.handleStateChange(SimpleStateChanger.java:45)
       at com.zhuinden.simplestack.Backstack$1.handleStateChange(Backstack.java:103)
       at com.zhuinden.simplestack.NavigationCore.changeState(NavigationCore.java:650)
       at com.zhuinden.simplestack.NavigationCore.beginStateChangeIfPossible(NavigationCore.java:614)
       at com.zhuinden.simplestack.NavigationCore.enqueueStateChange(NavigationCore.java:596)
       at com.zhuinden.simplestack.NavigationCore.executeOrConsumeNavigationOp(NavigationCore.java:482)
       at com.zhuinden.simplestack.NavigationCore.goBack(NavigationCore.java:437)
       at com.zhuinden.simplestack.Backstack.goBack(Backstack.java:1260)
       at com.zhuinden.simplestack.navigator.Navigator.onBackPressed(Navigator.java:310)

SimpleStack version used: 2.6.2

Zhuinden commented 2 years ago

On initial analysis, this would theoretically be caused in either .keySet().retainAll(), or just the trackedKeys.retainAll() calls (most likely the second based on the stack trace).

However, a fix for such a crash in Signal was done via Collections.synchronizedMap https://github.com/signalapp/Signal-Android/commit/edac0e85c76edb6c6632481611794cf13bb8456c , however Backstack ensures that no multi-threading is involved in this case, meaning this would be running on the UI thread throughout.

That begs the question, how is it possible that it finds an entry in the LinkedHashMap with an incorrect value?

According to the Android sources, this implementation that has nextEntry() in it was replaced with a new one that has nextNode() in it since most likely API 26.

https://android.googlesource.com/platform/libcore/+/9efb6d12ce4d2ffedb73d6e9887ea2c89f8ec129%5E%21/

So I'm curious what Android API version this crash happens on, and how common it is. Because at first glance, the only reliable way to fix it would be to ship my own LinkedHashMap. šŸ¤Ø

amadrian commented 2 years ago

Because at first glance, the only reliable way to fix it would be to ship my own LinkedHashMap

Yikes. That doesn't sound ideal. So far, this has happened around 750 times from a user base of 10+ million users. 100% of the crashes were happening on Android 6 / 6.0.1 devices. Would this help narrow down the issue?

Zhuinden commented 2 years ago

So far, this has happened around 750 times from a user base of 10+ million users.

100% of the crashes were happening on Android 6 / 6.0.1 devices.

So the fix came in in API 24, and the bug exists in API 23.

Well, it just means LinkedHashMap on Android 6 is not working as intended, so I'll need to ship a replacement. I'll look into this fairly soon, as it has no user-facing/API change other than the app no longer crashing sometimes on certain Android 6 devices. Unit tests guard all of this logic, so yeah šŸ˜…

Zhuinden commented 2 years ago

So I started looking into this but I'm a bit confused about it.

Namely, Line 545 is trackedKeys.retainAll()

https://github.com/Zhuinden/simple-stack/blob/40eca3d4a6b09c21fe35aa09db8dda6c2bd90353/simple-stack/src/main/java/com/zhuinden/simplestack/ScopeManager.java#L545

But trackedKeys isn't even a LinkedHashMap, it's a LinkedHashSet.

So I wrote this code, and all my tests pass even after


/**
 * This is a custom implementation of LinkedHashMap, because LinkedHashMap.keySet().retainAll() is broken on Android 6 and Android 6.1.
 * <p>
 * See related issue #256
 *
 * @param <K> key
 * @param <V> value
 */
class CustomLinkedHashMap<K, V> {
    private final LinkedHashSet<Map.Entry<K, V>> nodes = new LinkedHashSet<>();

    public void put(K serviceTag, V service) {
        nodes.add(new AbstractMap.SimpleEntry<K, V>(serviceTag, service));
    }

    @Nullable
    public V get(@Nullable K serviceTag) {
        for(Map.Entry<K, V> entry : nodes) {
            if((entry.getKey() == null && serviceTag == null) || (entry.getKey() != null && serviceTag != null && serviceTag.equals(
                    entry.getKey()))) {
                return entry.getValue();
            }
        }
        return null;
    }

    public boolean containsKey(@Nullable K objectTag) {
        for(Map.Entry<K, V> entry : nodes) {
            if((entry.getKey() == null && objectTag == null) || (entry.getKey() != null && objectTag != null && objectTag.equals(
                    entry.getKey()))) {
                return true;
            }
        }
        return false;
    }

    @Nullable
    public V remove(@Nullable K objectTag) {
        Iterator<Map.Entry<K, V>> iterator = nodes.iterator();
        while(iterator.hasNext()) {
            Map.Entry<K, V> entry = iterator.next();
            if((entry.getKey() == null && objectTag == null) || (entry.getKey() != null && objectTag != null && objectTag.equals(
                    entry.getKey()))) {
                iterator.remove();
                return entry.getValue();
            }
        }
        return null;
    }

    public Set<Map.Entry<K, V>> entrySet() {
        return nodes; // theoretically it would make more sense to expose an immutable copy, however this is not the contract of the original LinkedHashMap, therefore we keep this behavior.
    }

    public void putAll(@Nullable CustomLinkedHashMap<K, V> services) {
        if(services != null) {
            nodes.addAll(services.nodes);
        }
    }

    public boolean isEmpty() {
        return nodes.isEmpty();
    }

    @Nonnull
    public Set<K> keySet() {
        LinkedHashSet<K> keys = new LinkedHashSet<>(nodes.size());

        for(Map.Entry<K, V> entry : nodes) {
            keys.add(entry.getKey());
        }

        return Collections.unmodifiableSet(keys);
    }
}

But the issue is that I was trying to use LinkedHashSet, and if the error is correct, then it's the implementation of retainAll() inherited into LinkedHashSet from AbstractCollection is the culprit.


So this is a goose chase, what I'll do instead is reimplement retainAll() by hand, and replace usage of retainAll() with that. Not like it's particularly difficult.

Zhuinden commented 2 years ago

Having a bit of problems releasing 2.6.3 because maven-publish is behaving differently than before, but it should eventually happen šŸ¤”

Zhuinden commented 2 years ago

Released 2.6.4 with a replacement for the built-in retainAll() hopefully that fixes it (I do not have an Android 6 phone which means the fix is pre-emptive)

Zhuinden commented 2 years ago

As always, thanks for the report ā¤ļø

If you ever see it resurface (hopefully not), feel free to post an issue

I'm also excited to hear if the issue did not resurface for a significant amount of time

amadrian commented 2 years ago

Thank you so much for the super fast fix! I will be sure to update here if it resurfaces šŸ‘