Open Tolriq opened 7 years ago
After some more tests
builder.setDiskCacheExecutor(GlideExecutor.newDiskCacheExecutor(4, "disk-cache", UncaughtThrowableStrategy.IGNORE));
Allows to stop the spam so it seems the error happens in disk cache handling. (Using an IGNORE for setResizeExecutor have no effect).
Not really sure how to narrow down the issue :(
One last after switching to UncaughtThrowableStrategy.THROW
FATAL EXCEPTION: glide-disk-cache-thread-0
Process: xxxx, PID: 3545
java.lang.RuntimeException: Request threw uncaught throwable
at com.bumptech.glide.load.engine.executor.GlideExecutor$UncaughtThrowableStrategy$2.handle(GlideExecutor.java:301)
at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:349)
Caused by: java.lang.NullPointerException: Attempt to invoke virtual method 'int java.lang.Enum.ordinal()' on a null object reference
at com.bumptech.glide.load.engine.DecodeJob.getPriority(DecodeJob.java:200)
at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:192)
at com.bumptech.glide.load.engine.DecodeJob.compareTo(DecodeJob.java:35)
at java.util.concurrent.PriorityBlockingQueue.siftUpComparable(PriorityBlockingQueue.java:331)
at java.util.concurrent.PriorityBlockingQueue.offer(PriorityBlockingQueue.java:459)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1352)
at com.bumptech.glide.load.engine.executor.GlideExecutor.execute(GlideExecutor.java:195)
at com.bumptech.glide.load.engine.EngineJob.reschedule(EngineJob.java:239)
at com.bumptech.glide.load.engine.DecodeJob.reschedule(DecodeJob.java:342)
at com.bumptech.glide.load.engine.DecodeJob.runGenerators(DecodeJob.java:287)
at com.bumptech.glide.load.engine.DecodeJob.runWrapped(DecodeJob.java:249)
at com.bumptech.glide.load.engine.DecodeJob.run(DecodeJob.java:222)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
at java.lang.Thread.run(Thread.java:762)
at com.bumptech.glide.load.engine.executor.GlideExecutor$DefaultThreadFactory$1.run(GlideExecutor.java:347)
Seems the DecodeJob is having a null priority, but I do no touch any priority anywhere in code, so that a call to releaseInternal() is made but the job is still reused / running.
I suppose a Glide 4 expert can understand the race condition.
So after a few more time on this.
The root cause (Error recovery problem + null priority excepted that should maybe also be addressed) is that onDataReady onLoadFailed from DataCallback can be called after ressources being freed.
Leading to not handled NPE at least in SourceGenerator / DataCacheGenerator but certainly at other places.
I have not yet found the race condition, but from my DataFetcher the problem was that cleanup was called before the end of loadData but without calling cancel leading to my code to still call dataCallback.onDataReady(imageStream); as the only reason to not call it would have been a cancel.
If this is wanted then it should be documented, but code should be error proof and check for this situation maybe.
For the DataCacheGenerator the crash occurred also because of a null loadData when onDataReady is called. But since it's not my code that trigger that, I guess the race condition also occurs in Glide internal code.
@sjudd don't know how you want to handle this :( Can PR security checks in those handler as well it should better handle dev error, but finding root cause for normal usage would be better but out of my scope / time for now.
The priority + infinite retries on errors blocking Glide threads is also out of my current knowledge.
Edit: Securing all loadData / onDataReady does properly avoid all crash and infinite loop, with no more compare on cancelled jobs. But will sometimes leads to empty / bad images being cached as valid so not good enough :( Since my fetcher use a dual cache system, I can confirm that my fetcher correctly fill it's internal cache with valid data and then just return a FileInputStream over it, so no network error possible, just a race condition in the cancelling during the caching inside Glide as for the rest.
There are no retries of failed requests, it's probably just that you've got a bunch queued after fast scrolling.
Are you able to reproduce this reliably? Can you do so in a sample app (either one of Glide's, or one you create)?
Well no easy way to reproduce as my code is huge, but I can assure you there was infinite errors spamming logcat and a Glide thread running non stop.
See my second PR that prevent this from happening. It's more an internal wrong state that triggers this.
I suppose you can reproduce with a custom datafetcher that calls onDataReady after cleanup a few times. Currently out of time to build a full repro sorry.
Do you know where the call to cleanup that happens without a call to cancel is occurring?
Nope I did so many things to try to understand the issue that I lost the story :(
I suppose it could have been caused by some wrong code on my side as I found out that I had some cases where onDataReady could be called with an onLoadFailed too.
The thing is that all the errors that triggers after are really complicated to trace :(
The PR prevent those strange error loops. Maybe throwing proper errors would be better, but as said in the PR crash there triggers strange things that I do not really understand :)
Ok thanks. I'll try playing around with calling onDataReady more than once or onLoadFailed and onDataReady. I'd guess that has something to do with it since I haven't seen this before.
The race between cancellation and onDataReady or onLoadFailed is interesting though (where cancel ends up being called immediately prior to one of the callbacks). I don't think we have any explicit handling for that case and we we should.
This issue has been automatically marked as stale because it has not had activity in the last seven days. It will be closed if no further activity occurs within the next seven days. Thank you for your contributions.
So I have experienced this at my work. The getDataSource is causing this issue. When returning LOCAL, REMOTE, RESOURCE_DISK_CACHE, the infinite NPE on the DecodeJob.getPriority() happens and no image is loaded. When returning DATA_DISK_CACHE, the infinite java.lang.IllegalStateException: Already released is thrown but images are loaded fine. For me seems only MEMORY_CACHE and I am not 100% understand why.
I don't think MEMORY_CACHE is actually working. The try catch block in the customized DataFetcher still throws exception. Hope my comments can help investigation.
Glide Version: 4.0.0
Integration libraries: No
Device/Android Version: Android 7 Samsung.
Issue details / Repro steps / Use case background: Trying v4 before migrating. So very basic use case. GlideApp with a basic configuration and a custom fetcher migrated from v3.
Glide load line /
GlideModule
(if any) / list Adapter code (if any):The getGlideImageRequest just builds a specific object for internal use.
All works perfectly, but when scrolling very fast with image downloading, I will trigger infinite stack trace shown later. Making it quite hard to find the root cause as everything is spammed to death and no real clue about possible cause :(
Layout XML:
Stack trace / LogCat:
When scrolling slowly I sometimes get following error and image does not display even if the loader did load it correctly.
And was just able to trigger another version of the crash with a longer stacktrace: