bumptech / glide

An image loading and caching library for Android focused on smooth scrolling
https://bumptech.github.io/glide/
Other
34.68k stars 6.12k forks source link

Infinite error on fast scroll in recycler view. #2204

Open Tolriq opened 7 years ago

Tolriq commented 7 years ago

Glide Version: 4.0.0

Integration libraries: No

Device/Android Version: Android 7 Samsung.

Issue details / Repro steps / Use case background: Trying v4 before migrating. So very basic use case. GlideApp with a basic configuration and a custom fetcher migrated from v3.

Glide load line / GlideModule (if any) / list Adapter code (if any):

  RequestOptions requestOptions = new RequestOptions();
                requestOptions.placeholder(placeHolder);
GlideApp.with(appContext)
                        .load(getGlideImageRequest(url))
                        .apply(requestOptions)
.into(holder.fanart);

The getGlideImageRequest just builds a specific object for internal use.

All works perfectly, but when scrolling very fast with image downloading, I will trigger infinite stack trace shown later. Making it quite hard to find the root cause as everything is spammed to death and no real clue about possible cause :(

Layout XML:

<FrameLayout xmlns:android="...

Stack trace / LogCat:

Request threw uncaught throwable
                                                                                    java.lang.NullPointerException: Attempt to invoke interface method 'int java.lang.Comparable.compareTo(java.lang.Object)' on a null object reference
                                                                                        at java.util.concurrent.PriorityBlockingQueue.siftDownComparable(PriorityBlockingQueue.java:374)
                                                                                        at java.util.concurrent.PriorityBlockingQueue.dequeue(PriorityBlockingQueue.java:303)
                                                                                        at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:518)
                                                                                        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1058)
                                                                                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1118)
                                                                                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
                                                                                        at java.lang.Thread.run(Thread.java:762)
                                                                                        at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:347)

When scrolling slowly I sometimes get following error and image does not display even if the loader did load it correctly.


Request threw uncaught throwable
                                                                                     java.lang.IllegalStateException: Already released
                                                                                         at com.bumptech.glide.util.pool.StateVerifier$DefaultStateVerifier.throwIfRecycled(StateVerifier.java:44)
                                                                                         at com.bumptech.glide.load.engine.DecodeJob.setNotifiedOrThrow(DecodeJob.java:313)
                                                                                         at com.bumptech.glide.load.engine.DecodeJob.notifyFailed(DecodeJob.java:301)
                                                                                         at com.bumptech.glide.load.engine.DecodeJob.run(DecodeJob.java:231)
                                                                                         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
                                                                                         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
                                                                                         at java.lang.Thread.run(Thread.java:762)
                                                                                         at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:347)

And was just able to trigger another version of the crash with a longer stacktrace:


Fatal Exception: java.lang.NullPointerException: Attempt to invoke direct method 'int com.bumptech.glide.load.engine.DecodeJob.getPriority()' on a null object reference
       at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:192)
       at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:35)
       at java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:331)
       at java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:459)
       at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1352)
       at com.bumptech.glide.load.engine.executor.GlideExecutor.execute(GlideExecutor.java:195)
       at com.bumptech.glide.load.engine.EngineJob.start(EngineJob.java:93)
       at com.bumptech.glide.load.engine.Engine.load(Engine.java:212)
       at com.bumptech.glide.request.SingleRequest.onSizeReady(SingleRequest.java:416)
       at com.bumptech.glide.request.target.ViewTarget$SizeDeterminer.notifyCbs(ViewTarget.java:181)
       at com.bumptech.glide.request.target.ViewTarget$SizeDeterminer.checkCurrentDimens(ViewTarget.java:197)
       at com.bumptech.glide.request.target.ViewTarget$SizeDeterminer$SizeDeterminerLayoutListener.onPreDraw(ViewTarget.java:310)
       at android.view.ViewTreeObserver.dispatchOnPreDraw(ViewTreeObserver.java:1013)
       at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2506)
       at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1515)
       at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7091)
       at android.view.Choreographer$CallbackRecord.run(Choreographer.java:927)
       at android.view.Choreographer.doCallbacks(Choreographer.java:702)
       at android.view.Choreographer.doFrame(Choreographer.java:638)
       at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:913)
       at android.os.Handler.handleCallback(Handler.java:751)
       at android.os.Handler.dispatchMessage(Handler.java:95)
       at android.os.Looper.loop(Looper.java:154)
       at android.app.ActivityThread.main(ActivityThread.java:6682)
       at java.lang.reflect.Method.invoke(Method.java)
       at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1520)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1410)
Tolriq commented 7 years ago

After some more tests

builder.setDiskCacheExecutor(GlideExecutor.newDiskCacheExecutor(4, "disk-cache", UncaughtThrowableStrategy.IGNORE));

Allows to stop the spam so it seems the error happens in disk cache handling. (Using an IGNORE for setResizeExecutor have no effect).

Not really sure how to narrow down the issue :(

Tolriq commented 7 years ago

One last after switching to UncaughtThrowableStrategy.THROW

FATAL EXCEPTION: glide-disk-cache-thread-0
                                                                                    Process: xxxx, PID: 3545
                                                                                    java.lang.RuntimeException: Request threw uncaught throwable
                                                                                        at com.bumptech.glide.load.engine.executor.GlideExecutor$UncaughtThrowableStrategy$2.handle(GlideExecutor.java:301)
                                                                                        at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:349)
                                                                                     Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'int java.lang.Enum.ordinal()' on a null object reference
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.getPriority(DecodeJob.java:200)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:192)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:35)
                                                                                        at java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:331)
                                                                                        at java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:459)
                                                                                        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1352)
                                                                                        at com.bumptech.glide.load.engine.executor.GlideExecutor.execute(GlideExecutor.java:195)
                                                                                        at com.bumptech.glide.load.engine.EngineJob.reschedule(EngineJob.java:239)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.reschedule(DecodeJob.java:342)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.runGenerators(DecodeJob.java:287)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.runWrapped(DecodeJob.java:249)
                                                                                        at com.bumptech.glide.load.engine.DecodeJob.run(DecodeJob.java:222)
                                                                                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
                                                                                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
                                                                                        at java.lang.Thread.run(Thread.java:762)
                                                                                        at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:347)

Seems the DecodeJob is having a null priority, but I do no touch any priority anywhere in code, so that a call to releaseInternal() is made but the job is still reused / running.

I suppose a Glide 4 expert can understand the race condition.

Tolriq commented 7 years ago

So after a few more time on this.

The root cause (Error recovery problem + null priority excepted that should maybe also be addressed) is that onDataReady onLoadFailed from DataCallback can be called after ressources being freed.

Leading to not handled NPE at least in SourceGenerator / DataCacheGenerator but certainly at other places.

I have not yet found the race condition, but from my DataFetcher the problem was that cleanup was called before the end of loadData but without calling cancel leading to my code to still call dataCallback.onDataReady(imageStream); as the only reason to not call it would have been a cancel.

If this is wanted then it should be documented, but code should be error proof and check for this situation maybe.

For the DataCacheGenerator the crash occurred also because of a null loadData when onDataReady is called. But since it's not my code that trigger that, I guess the race condition also occurs in Glide internal code.

@sjudd don't know how you want to handle this :( Can PR security checks in those handler as well it should better handle dev error, but finding root cause for normal usage would be better but out of my scope / time for now.

The priority + infinite retries on errors blocking Glide threads is also out of my current knowledge.

Edit: Securing all loadData / onDataReady does properly avoid all crash and infinite loop, with no more compare on cancelled jobs. But will sometimes leads to empty / bad images being cached as valid so not good enough :( Since my fetcher use a dual cache system, I can confirm that my fetcher correctly fill it's internal cache with valid data and then just return a FileInputStream over it, so no network error possible, just a race condition in the cancelling during the caching inside Glide as for the rest.

sjudd commented 7 years ago

There are no retries of failed requests, it's probably just that you've got a bunch queued after fast scrolling.

Are you able to reproduce this reliably? Can you do so in a sample app (either one of Glide's, or one you create)?

Tolriq commented 7 years ago

Well no easy way to reproduce as my code is huge, but I can assure you there was infinite errors spamming logcat and a Glide thread running non stop.

See my second PR that prevent this from happening. It's more an internal wrong state that triggers this.

I suppose you can reproduce with a custom datafetcher that calls onDataReady after cleanup a few times. Currently out of time to build a full repro sorry.

sjudd commented 7 years ago

Do you know where the call to cleanup that happens without a call to cancel is occurring?

Tolriq commented 7 years ago

Nope I did so many things to try to understand the issue that I lost the story :(

I suppose it could have been caused by some wrong code on my side as I found out that I had some cases where onDataReady could be called with an onLoadFailed too.

The thing is that all the errors that triggers after are really complicated to trace :(

The PR prevent those strange error loops. Maybe throwing proper errors would be better, but as said in the PR crash there triggers strange things that I do not really understand :)

sjudd commented 7 years ago

Ok thanks. I'll try playing around with calling onDataReady more than once or onLoadFailed and onDataReady. I'd guess that has something to do with it since I haven't seen this before.

The race between cancellation and onDataReady or onLoadFailed is interesting though (where cancel ends up being called immediately prior to one of the callbacks). I don't think we have any explicit handling for that case and we we should.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had activity in the last seven days. It will be closed if no further activity occurs within the next seven days. Thank you for your contributions.

shijiexug2016 commented 5 years ago

So I have experienced this at my work. The getDataSource is causing this issue. When returning LOCAL, REMOTE, RESOURCE_DISK_CACHE, the infinite NPE on the DecodeJob.getPriority() happens and no image is loaded. When returning DATA_DISK_CACHE, the infinite java.lang.IllegalStateException: Already released is thrown but images are loaded fine. For me seems only MEMORY_CACHE and I am not 100% understand why.

shijiexug2016 commented 5 years ago

I don't think MEMORY_CACHE is actually working. The try catch block in the customized DataFetcher still throws exception. Hope my comments can help investigation.