bigdataviewer / bigdataviewer-core

ImgLib2-based viewer for registered SPIM stacks and more
BSD 2-Clause "Simplified" License
35 stars 35 forks source link

data fetching strategy #81

Open tischi opened 4 years ago

tischi commented 4 years ago

@axtimwalde @tpietzsch @constantinpape

I was wondering about the data fetching (repainting) strategy of bdv. Let's say I start at resolution level 0 and then use the up arrow key to zoom in until I reach level 5 of that data set.

I think my questions are:

The reason I am asking is that my CPUs become very busy when I zoom in (I feel more busy than when I just rotate without zooming in).

tpietzsch commented 4 years ago

I was wondering about the data fetching (repainting) strategy of bdv. Let's say I start at resolution level 0 and then use the up arrow key to zoom in until I reach level 5 of that data set.

Just to avoid confusion: 0 is the highest resolution. So you would start zoomed out at 5 and then zoom in to 1.

The following is off the top of my head, I would have to look into the code for exact details

  • Will bdv also load all the intermediate resolution levels (1-4), when zooming in?

Yes.

  • And will it load the data not only for the region that one finally looks at, but also for the whole cone that one travels when zooming in?

That depends...

For every frame, the blocks needed to render that frame are requested and enqueued for loading. That queue is ordered by priority. Generally speaking, I think low-resolution blocks have higher priority.

In a prefetch step, all visible blocks of the ideal resolution level for the current frame are put into the queue (and thus basically already start loading before anything is rendered). Then, when a rendering thread hits a missing block, it looks for lower resolution versions of that data until either reaching a level with valid data or the lowest-resolution level. All blocks of all resolutions that are touched by this search are put in the queue.

At the start of the next frame, the queue is cleared. It probably from the last frame still contains a lot of blocks that should be loaded. These outstanding requests are moved to a "prefetch" queue of limited size. When loader threads have nothing to do (the "real" queue is empty, they start loading blocks from this prefetch queue.

  • Or is there some logic that, e.g.

    • only starts to repaint when one releases the arrow up key?

No.

  • only starts loading data for a region that has been visible in the ViewerPanel for more than x ms?

No.

The reason I am asking is that my CPUs become very busy when I zoom in (I feel more busy than when I just rotate without zooming in).

Given the above description, it's very possible that you are loading more blocks when zooming in (or out) than when just rotating.

You could attach VisualVM and see whether you can see any differences in how busy the loader threads are.

tischi commented 4 years ago

Given the above description, it's very possible that you are loading more blocks when zooming in (or out) than when just rotating.

Thanks! Yes, that's how I understood it from your description.

Are there parameters accessible to play with it? For example, is there something like a frameRate? From what you wrote it sounds like decreasing such a frameRate might be a way to skip loading some blocks when browsing very fast through the image? A bit like my suggestion of only considering a view as worthwhile when the user spends a certain amount of time looking at it.

Also, in BdvOptions I found three things that sound related:

Would you recommend playing with any of those?

tpietzsch commented 4 years ago

Are there parameters accessible to play with it?

Not really. You can try to increase targetRenderNanos, which will decrease frame rate and may help as a side effect...

An easy option to try something in the source code would be to change in the VolatileGlobalCellCache constructor

    public VolatileGlobalCellCache( final int maxNumLevels, final int numFetcherThreads )
    {
        queue = new BlockingFetchQueues<>( maxNumLevels, numFetcherThreads );
        new FetcherThreads( queue, numFetcherThreads );
        backingCache = new SoftRefLoaderCache<>();
    }

the line queue = new BlockingFetchQueues<>( maxNumLevels, numFetcherThreads ); to queue = new BlockingFetchQueues<>( maxNumLevels, numFetcherThreads, XXX ); where XXX is the size of the prefetch queue (default is 16384). Turn that down, and it should limit loading blocks from past frames.

If that's helpful, it could be exposed through BdvOptions

tischi commented 4 years ago

I played around with both the BlockingFetchQueues and the targetRenderNanos but I could not find a clear effect. Increasing targetRenderNanos seemed to help a bit but then there also is a rugged feel to the browsing. I think maybe the main issue is that our block sizes in the stored data are too large.

axtimwalde commented 4 years ago

Relatedly: The current block fetching strategy is optimized for low latency situations such as local file systems, in particular the limited number of fetcher threads can artificially slow down loading time for low latency back ends such as cloud storage. In a low latency world, it is usually better to submit as many requests as you can in parallel and process or reject them as they slowly drop in. I am unsure if simply increasing the number of fetcher threads can remedy this without unintended side effects. @tischi, as you're currently on it, can you conduct this experiment?

tischi commented 4 years ago

I am unsure if simply increasing the number of fetcher threads can remedy this without unintended side effects. @tischi, as you're currently on it, can you conduct this experiment?

I will do it but, I feel currently our block sizes are a bit too large. @axtimwalde in terms of NxNxN pixels per block, do you have a suggestion for N for the n5-aws-s3 scenario?

constantinpape commented 4 years ago

@tischi I am setting up several test datasets for you with different chunk sizes atm, will let you know when this is there.

axtimwalde commented 4 years ago

With more threads, I believe smaller is better. How about 64^3?.

tischi commented 4 years ago

We tried different chuck sizes: [64, 64, 64] and[128, 128, 128], and 64^3 seemed more responsive.

I also tried different numRenderingThreads: 3 (the default) and 20 and maybe 20 felt faster. I am not sure this makes sense, since I only have 4 cores on my computer?

I did these "tests" by visual inspection, browsing around in bdv and trying to repeat the same manouvering for the different cases. I guess if one really wants to know one could add some code into bdv-core to measure how much time it takes until the highest resolution level of a new view has been completely loaded. This is however for me currently out-of-scope, because of too many other things on my plate.

constantinpape commented 4 years ago

I also tried different numRenderingThreads: 3 (the default) and 20 and maybe 20 felt faster. I am not sure this makes sense, since I only have 4 cores on my computer?

Yes, I think in our setting this makes sense. The operations are not limited by cpu, but by fetching the chunks from s3. So increasing the number of threads past the number of cores can bring a speed up.

tischi commented 4 years ago

Yes, I think in our setting this makes sense. The operations are not limited by cpu, but by fetching the chunks from s3. So increasing the number of threads past the number of cores can bring a speed up.

Again, this is purely by visual feeling but changing from 3 to 20 mainly seemed to matter in the 64^3 case and less so in the 128^3 case.