Priority handling for identical data request

Tolriq commented 3 years ago

Is your feature request related to a problem? Please describe. At some point in my application an image can be loaded multiple times at different size, but I have no control over the order of the requests. Since I'm allowing inexact size it would be nice if there could be some way to optimize the request for the same data to be better ordered so that cache works better.

I'm often ending up with all request running more or less simultaneously or in smaller to higher order leading to no usage of the memory cache as each request are larger than the previous one.

Since I don't know in advance if the images will be displayed multiple times I don't want to use a tons of memory and always use Size(Original).

I've not found a way to do anything with interceptors or the current pipeline but I maybe missed something.

colinrtwhite commented 3 years ago

optimize the request for the same data to be better ordered so that cache works better

I'm not sure I understand - what's the ask/potential API you're looking for?

always use Size(Original)

I'd avoid doing this as you'll probably end up using more memory. If two URLs are loaded at different sizes, we only keep the larger one in the memory cache and discard the smaller one automatically.

request running more or less simultaneously

It's possible to add a "request cache" using an interceptor so requests with the same data are merged. Here's a quick example (untested):

class RequestCacheInterceptor : Interceptor {

    private val cache = ConcurrentHashMap<Any, Async<ImageResult>>()

    suspend fun intercept(chain: Interceptor.Chain): ImageResult {
        return cache.getOrPut(chain.request.data) {
            async {
                chain.proceed(chain.request)
            }
        }.join()
    }
}

With the above, the first request that's executed will dictate the size at which the image is loaded. Also be warned: it's possible other params different for the request other than just the data (e.g. bitmap config, transformations) that change the expected output. This is the main reason Coil can't add this internally since it makes assumptions that aren't always true.

Tolriq commented 3 years ago

always use Size(Original)

I'd avoid doing this as you'll probably end up using more memory. If two URLs are loaded at different sizes, we only keep the larger one in the memory cache and discard the smaller one automatically.

Yes that's why I do not want to do that.

About the API, there's 2 different things that Glide can support and could help here. 1) Priority on request queue, so that I can give the higher size image a bigger priority so that it's loaded first 2) In flight dedup so that simultaneous request of the same image are not started at the same time but wait for the previous one in case cache can be reused. Don't know if it's currently the case or not. But your interceptor idea can help build that the problem is that I think the interceptor works before checking the memory cache and it could delay those that would match no? Ideally the in flight call merge should happens when not in memory cache.

Edit: By in flight dedup, it's not like your merge with cache, it's to queue same data request to increase cache hit chance.

colinrtwhite commented 3 years ago

Priority on request queue, so that I can give the higher size image a bigger priority so that it's loaded first

How would this work in practice? Requests are launched as soon as they're executed and the Coroutines library handles thread fairness. We could pause smaller requests in flight, but that might not be best for performance since they could be 90% done when a larger image request comes in - delaying returning the smaller image.

In flight dedup

This is very, very complex to do and isn't likely to happen:

We can only de-dupe based on the hashCode of the ImageRequest and custom Targets and Listeners often make this much less useful in practice. De-duping only based on the data could end up returning images with the wrong bitmap config, transformations, custom parameters, etc.
We don't know ahead of time if the image pipeline is going to return a BitmapDrawable or another type of Drawable. Bitmaps can be shared whereas generic drawables can't. If we merge two requests then the pipeline returns a non-bitmap drawable, we'll have to launch the request again which is slower than if we had not waited.
If you want to de-dupe based on data, your best bet is to use the custom interceptor above. It does de-dupe before the call to the memory cache, but that's ok since chain.proceed will return synchronously if it's in the memory cache (it adds no delay).

Tolriq commented 3 years ago

I tried to edit to make the dedup thing more clear, as I'm not native English speaker.

Dedup is the wrong term it's more a a queue handling based on data uniqueness.

So that if you start 3 queries with the same data, the 3 queries does not start at the same time to not possible start 3 network / disk load at the same time when the fetcher would have returned the same source.

This does not change anything on the rest of the pipeline and params handling and everything. It just vastly increase the chance to actually be able to use the cache when loading those.

Like you start 3 request at original size at the same time first is running 2 other are queued, first finish it's cached 2 others are from cache if possible instead of 3 server / disk load.

The priority in that case could be applied to that queue for the that data and the next calls.

Priority concept also works when you use worker pools to control concurrency like I think you do in 2.0 to limit to 4 decodebitmap. Priority can optimize the order of the channel items handled by the pool.

colinrtwhite commented 3 years ago

Gotcha, we could do this, however it's not going to be a performance benefit in all cases. If we queue requests based on data uniqueness it's possible we could end up spending more time waiting than if we had just run the requests in parallel. For example, loading the same disk cached GIF into multiple ImageViews is much faster if we run the requests in parallel. If we queue the requests based on the uniqueness of the URL, we'll end up decoding the GIFs in serial instead of parallel without any improvement to our cache hit rate.

Overall, this is something I think is best implemented using a custom interceptor (which can delay incoming requests based on data uniqueness) since it can be tweaked for your specific use case.

EDIT: Video frames is another example where queuing based on data uniqueness would be much slower as decoding often takes a long time and it's common to launch many requests for different frame timecodes (which isn't part of the data) for the same video URI.

Tolriq commented 3 years ago

Maybe add a parameter to the image request to have this optional.

Interceptor is a little too early because of memory cache and trying to do something at fetcher level is too late.

One way would be to have an additional optional pipeline that could reorder / delay the query queue by tracking all in flight queries opening priority and such queuing for those who need to.

Edit: to be clear about memory cache the issue is that a running uncached unique query will lock possible cache hit during it's fetch so delaying cases that could be fast and reverting the efficiency of the whole thing.

colinrtwhite commented 3 years ago

@Tolriq You can check the memory cache manually in an interceptor to avoid executing the image pipeline if you want to avoid locking (imageLoader.memoryCache is public). The memory cache API is also improved in 2.0 so this should be much easier when that releases.

Making this an optional param (disabled by default) would mean most users won't use it so I don't think this should live in the core library and is best as a custom interceptor.

Tolriq commented 3 years ago

Ok I'll first check what I can achieve with interceptor and custom data. Do you think you can publish at least a snapshot of 2.0? I could not find one and while I really want to migrate to coil for my new apps I'd like to be sure I can hack my few needs in it.

Tolriq commented 3 years ago

@colinrtwhite I'm doing some quick tests for the memory cache thing and when at the interceptor level the chain.request.memoryCacheKey is null so as per doc will be computed later so can't be used to get from the memory cache.

Is there some documentation / way to generate it properly at that point in time?

colinrtwhite commented 3 years ago

@Tolriq I can't publish a snapshot since it's not finished yet. You'll have to set the memory cache key manually using MemoryCacheKey(url).

Tolriq commented 3 years ago

If I compute the key from just data/url then we face the same issue as previously of wrong cache returned without the params applied.

Would need computeMemoryCacheKey but internal and not sure all is already accessible at that point. Specially the size that is not yet resolved at that point no?

Or there's something on the pipeline I do not understand.

Maybe it should more be done at the fetcher level but since we can't really delegate to internal one it's not simple.

colinrtwhite commented 3 years ago

@Tolriq You can access the size from chain.size. You can copy the implementation of computeMemoryCacheKey locally if you want to compute the cache key the same way as it's done internally. You'll be able to delegate to other fetchers in 2.x; there isn't a good solution for this in 1.x.

Tolriq commented 3 years ago

I can't find a diagram of how the pipeline works precisely.

But the interceptor runs high in the hierarchy, size is probably still the requested one so possibly unknown and not yet the actual size that will be cached, Mappers are not yet applied, checking if the cache is valid is very complex and need duplication, basically redoing the EngineInterceptor.

After thinking a lot about this I think the need is more or less like Okhttp Application interceptors like what you propose and Network interceptors (So in your case a new Fetcher interceptors) that would run just before the fetcher.

And from checking quickly 2.X code it seems it add support for fetchers fallback not delegation. You still can't call a delegate fetcher from a fetcher and do something with it's result so does not allow a global priority / limitations system that applies to all internal Coil fetchers.

Something that a fetcher interceptor would allow.

Tolriq commented 3 years ago

@colinrtwhite

Some working code sample to show you how a fetcher interceptor brings the necessary things, for priority handling, concurrency handling, throttling on same data, and more things.

Unfortunately applying this at current interceptor level is not efficient at all due to memory cache.

@OptIn(ExperimentalCoilApi::class)
class CoilPriorityWorkerPool(
    dispatcher: CoroutineDispatcher,
    maxConcurrentTasks: Int
) : CoroutineScope {

    internal class Task constructor(owner: Job, val interceptorChain: Interceptor.Chain) {
        val response = CompletableDeferred<ImageResult>(owner)
    }

    private val immediateQueue: MutableList<Task> = mutableListOf()
    private val highQueue: MutableList<Task> = mutableListOf()
    private val defaultQueue: MutableList<Task> = mutableListOf()
    private val queueMutex = Mutex()

    private val runningQueue: MutableMap<ImagePath, ImageRequest.Priority> = mutableMapOf()
    private val runningQueueMutex = Mutex()

    private val tasks = Channel<Unit>(Channel.UNLIMITED)

    override val coroutineContext = dispatcher + SupervisorJob()

    init {
        repeat(maxConcurrentTasks) {
            launch {
                for (task in tasks) executeTask()
            }
        }
    }

    private suspend fun addToPriorityQueue(task: Task) {
        queueMutex.withLock {
            when ((task.interceptorChain.request.data as ImageRequest).priority) {
                ImageRequest.Priority.IMMEDIATE -> immediateQueue.add(task)
                ImageRequest.Priority.NORMAL -> defaultQueue.add(task)
                ImageRequest.Priority.HIGH -> highQueue.add(task)
            }
        }
        tasks.send(Unit)
    }

    suspend fun execute(request: Interceptor.Chain): ImageResult {
        val task = Task(kotlin.coroutines.coroutineContext.job, request)
        addToPriorityQueue(task)
        return task.response.await()
    }

    fun close(immediately: Boolean = false) {
        tasks.close()
        if (immediately) {
            coroutineContext.cancel()
        }
    }

    private suspend fun executeTask() {
        val task = queueMutex.withLock {
            immediateQueue.removeFirstOrNull() ?: highQueue.removeFirstOrNull() ?: defaultQueue.removeFirst()
        }
        val request = task.interceptorChain.request.data as ImageRequest
        if (request.throttle) {
            runningQueueMutex.withLock {
                val isRunningPriority = runningQueue[request.imagePath]
                if (isRunningPriority != null && isRunningPriority <= request.priority) {
                    addToPriorityQueue(task)
                    return
                }
                runningQueue[request.imagePath] = request.priority
            }
        }
        runCatching {
            task.response.complete(task.interceptorChain.proceed(task.interceptorChain.request))
        }.onFailure {
            task.response.completeExceptionally(it)
        }
        runningQueueMutex.withLock {
            runningQueue.remove(request.imagePath)
        }
    }
}

That workerPool can then easily be used like

@OptIn(ExperimentalCoilApi::class)
class RequestCacheInterceptor : Interceptor {
    private val workerPool = CoilPriorityWorkerPool(Dispatchers.IO, 5)

    override suspend fun intercept(chain: Interceptor.Chain): ImageResult {
        val data = chain.request.data
        return if (data is ImageRequest) {
            val result = workerPool.execute(chain)
            yield()
            result
        } else {
            chain.proceed(chain.request)
        }
    }
}

colinrtwhite commented 3 years ago

@Tolriq There are a number of constraints (check out EngineInterceptor) that make it very difficult to support intercepting after the memory cache but before fetching/decoding. It would be nice to be able to do this, however there are no plans or timeline to do the work to support it.

Unfortunately applying this at current interceptor level is not efficient at all due to memory cache.

I don't see how it's not efficient using a regular interceptor. The only case where it would be less efficient is if two requests with the same data don't resolve to the same memory cache key. This is only true if the requests use custom transformations or have different custom parameters. Standard requests to the same URL will wait for the first request to finish decoding after which it'll be returned instantly from the memory cache for the second and any subsequent requests.

2.X code it seems it add support for fetchers fallback not delegation

2.x does add support for delegation: imageLoader.components.newFetcher("https://www.example.com/image.jpg".toUri(), options, imageLoader).fetch()

Tolriq commented 3 years ago

There are a number of constraints (check out EngineInterceptor) that make it very difficult to support intercepting after the memory cache but before fetching/decoding.

Unless I miss something I don't really see why, we are talking about another kind of interceptors that can be added to the registry and can wrap https://github.com/coil-kt/coil/blob/2.0/coil-base/src/main/java/coil/intercept/EngineInterceptor.kt#L299 passing the fetcher and necessary data to be able to run the interceptors in queue as usual ordered interceptor stuff handling. It's a lot simpler than the EngineInterceptor because it occurs at a way lower level so have basically nothing to do.

I don't see how it's not efficient using a regular interceptor. The only case where it would be less efficient is if two requests with the same data don't resolve to the same memory cache key.

Because not only I do that, but there's not just the url / data memory key to be able to return early from cache we need to duplicate: getMemoryCacheKey / isCachedValueValid / isSizeValid that are quite big + hoping they do not use any other internal not accessible stuff. And since they are internal they can be changed at any point in time rendering the whole thing broken. Remember that the whole thing is to optimize order on inexact size to have memory cache hit and not load and decode 3 times the same image for not benefits. If I just compare size then I just bring throttling without any gain.

A patch like https://github.com/coil-kt/coil/pull/899 could totally be made in a 1.3.3, be totally unnoticed leading to complex to find issue later.

2.x does add support for delegation: imageLoader.components.newFetcher("https://www.example.com/image.jpg".toUri(), options, imageLoader).fetch()

I guess that will do it then, I'll just have to wrap everything in a custom type and a custom fetcher.

As a final note about concurrency handling like Glide can offer on the different levels for performance optimisation, you may want to look at the WorkerPool pattern like what I show earlier to have proper concurrency limits without having to switch dispatchers or even threadpools so avoiding tons of costly context switch changes.

colinrtwhite commented 3 years ago

@Tolriq If you want to intercept all fetchers in 2.x, use a fetcher that implements Fetcher.Factory<Any> then call imageLoader.components.newFetcher(data, options, imageLoader).fetch() inside your custom fetcher. You can pool requests, don't need to wrap everything in a custom type, don't need to copy any internal code, and don't need to query the memory cache.

I don't think it's a good idea to add a custom interceptor type that only intercepts the fetch response and I'm not seeing a strong use case to add a new interceptor type as this is covered by a custom fetcher that delegates.

switch dispatchers or even threadpools so avoiding tons of costly context switch changes

Coil uses the same dispatcher to execute the request by default, doesn't switch context, and doesn't use thread pools.

Going to close this out as this will soon be possible in 2.x.

Tolriq commented 3 years ago

Coil uses the same dispatcher to execute the request by default, doesn't switch context, and doesn't use thread pools.

Added with no concurrency limits it was the reason why I could not even envisage using Coil before 2.0 and the addition of some Semaphore to BitmapFactoryDecoder to try to workaround the lack of concurrency that was forcing to use a custom dispatcher.

Semaphore that still does not prevent to have 100 disk reads started and locked opened while the decoder decodes. While it's still not 100 full disk read it's still 100 opening and every cost associated like when dealing with content:// uris. Added with 0 priority handling of that queue. This does not suits a few use cases performance / UX wise. So I still think Coil concurrency handling is missing what is needed to be the perfect OOTB library specially with all the tools that coroutines offer out of the box for that.

I really wish my English skill was higher.

Anyway yes 2.0 delegation will allow to build most of the missing piece and yes Fetcher that delegate is the same as the new interceptor if this interceptor runs first or can be configured to run first thanks for pointing this subtilty.

coil-kt / coil

Priority handling for identical data request #916