Closed NicoKiaru closed 1 year ago
I cannot comment deeply, because I have not looked into this in detail yet myself. But, one thing I can share is that in MoBIE we have a function like this:
public void closeSourceAndConverter( SourceAndConverter< ? > sourceAndConverter, boolean closeImgLoader )
{
SourceAndConverterServices.getBdvDisplayService().removeFromAllBdvs( sourceAndConverter );
String sourceName = sourceAndConverter.getSpimSource().getName();
if ( closeImgLoader )
{
final ImgLoader imgLoader = sourceNameToImgLoader.get( sourceName );
if ( imgLoader instanceof N5ImageLoader )
{
( ( N5ImageLoader ) imgLoader ).close();
}
else if ( imgLoader instanceof N5OMEZarrImageLoader )
{
( ( N5OMEZarrImageLoader ) imgLoader ).close();
}
}
sourceNameToImgLoader.remove( sourceName );
sourceNameToSourceAndConverter.remove( sourceName );
SourceAndConverterServices.getSourceAndConverterService().remove( sourceAndConverter );
}
We found this necessary to clear up the memory.
Regarding the close
calls to imageloader, in theory it could be taken care of in
https://github.com/bigdataviewer/bigdataviewer-playground/blob/da133beb859b2ac6333c2445bf417129f8d9a2d7/src/main/java/sc/fiji/bdvpg/scijava/services/SourceAndConverterService.java#L475-L481
However the imageloaders need to implement closeable, and this does not seem to be the case... I'll make a PR.
Where does the Closeable
interface come from? I did not know about it...
We would need to add this in mobie-io
.
PR done here:
https://github.com/bigdataviewer/bigdataviewer-core/pull/142
(Closeable is java.io.Closeable
. But it's not yet included in the Image Loader, that's why I've made the PR)
Note that several SAC
can use the same ImgLoader
, e.g. when we construct a new SAC
by transforming a SAC
. So I think the SACService
should only close an ImgLoader
if it is sure that there is no SAC
left that would need it. Maybe we need a Map< ImgLoader, Set< SAC > >
to keep track of this?
Note that several
SAC
can use the sameImgLoader
, e.g. when we construct a newSAC
by transforming aSAC
. So I think theSACService
should only close anImgLoader
if it is sure that there is noSAC
left that would need it. Maybe we need aMap< ImgLoader, Set< SAC > >
to keep track of this?
Yes, it's already taken care of in the lines before (bdv - pg stores which sources are attached to a spimdataset):
OK, how would I use this in MoBIE? There we open the SACs ourselves, not via bdv-pl...
Do I have to somehow register a SpimData
object with a SAC
?
How?
Do I have to somehow register a
SpimData
object with aSAC
?
You can with this function:
There's now a global cache and globalcacheloader objects which are used by resampled sources and by every image loader which contains a cache
field. The cache field is replaced thanks to reflection.
All relevant classes are in the package https://github.com/bigdataviewer/bigdataviewer-playground/tree/master/src/main/java/sc/fiji/bdvpg/cache
.
Also : the cache is weighted by the memory thanks to this function: https://github.com/bigdataviewer/bigdataviewer-playground/blob/ddac2e15f51581a7f1b612bc385495ff9e4b7648/src/main/java/sc/fiji/bdvpg/cache/AbstractGlobalCache.java#L119-L146
@NicoKiaru Does this work now?
What do I need to do to clear all caches related to one SourceAndConverter
?
What's your imageloader ? I can tell you if it will work or not ?
If, in your image loader there is a field named cache
of type VolatileGlobalCellCache
, the caching should work pretty nicely.
If your image loader implements Closeable
, the close function will be called if all source and converters from the spimdata object are removed from the source and converter service
Thanks @NicoKiaru !
That would be the most relevant right now: https://github.com/mobie/mobie-io/blob/main/src/main/java/org/embl/mobie/io/ome/zarr/loaders/N5OMEZarrImageLoader.java
But long term we may want to switch to opening images like this: https://github.com/mobie/mobie-io/blob/3fd0180fa013141dc5d8370dd365794904bfaf27/src/main/java/org/embl/mobie/io/ome/zarr/hackathon/MultiscaleImage.java#L154
Looks ok for the cache. But it does not implement Closeable. Since there already is a close function, just add implement Closeable
.
This can directly use the GlobalCache I've implemented (see below)
The issue I see with MultiscaleImage is that you use N5Utils.openVolatile
:
Here the cache mechanism is hidden into N5Utils. In its current form it won't be optimal because the cache size is handled at a per source level. Let's say you have 15 Gb of RAM, and a hundred sources of 10 Gb each:
You can use unbounded soft refs everywhere:
You can use a bounded number of refs:
What you would want ideally, is to set a total amount of memory that can be used by the cache of all sources, and when the max is reached, you want to remove the least useful data first, to let some space for new things. And if the limit is global for all sources, you do not need to care whether one source occupies all the RAM, or if many sources are each using a fraction of the total memory.
That's the problem I solved here:
I'm replacing the cache field (through reflection) of SpimData with a GlobalLoaderCache
which caches cells and has a policy based on max memory, taking into account all spimdata objects, and potentially other sources that are using a GlobalLoaderCache
.
The global loader cache is here:
And it can use a BoundedHashMap or a Caffeine cache backend.
Also importantly, it measures the memory size of each cell which is added in the cache, assuming all objects are instances of Cell
, which is always the case so far:
So if you have a source with little cells (32x32x32) and other sources with big cells (1024x1024x32), the cache will be know which memory size is occupied.
There's one limitation though: it only works with read-only cache.
If there was a way to specify the LoaderCache to use in N5Utils, then the problem could be solved this way.
The goal
I'd like to improve Bdv-Playground and make it capable of handling really big data with low RAM.
Why it's not ideal currently
It's doing ok, but it sometimes fails because some cached sources have a
SoftRef
policy which is not clearing the memory early enough. When the memory is nearly full, the JVM then spends most of its time in GC calls, and this makes the GUI not responsive and me pretty annoyed. Soft references do not seem to be cleared fast enough in this case, and also, part of the problem comes from the absence of order guarantee when removing cached data : the 'cell' which has been accessed the latest is not necessarily the first removed (see extract from https://www.baeldung.com/java-soft-references):Where soft references are generated in bdv playground ?
I think the main issue in bdv playground comes from the class RAIHelper which has a softref policy. RAIHelper is used by default in bdv-playground for all cached sources which are not coming from a SpimDataset (Resampled sources, Fused sources).
How to solve the issue ?
1. Quick fix
I could probably just put a
BoundSoftRef
policy with a fixed amount of cells in RAIHelper, but one of the design problem I have is that I make a cache for each 3D RAI (one cache per resolution level and per timepoint). So if I have 50 timepoints and 5 resolution levels, I create 250 different caches. If I put a bound of 10 cells for each cache, I have in fact a max bound of 2500 cells.2. Better fix
2.a. A cache for each source
It would be better to have at least a single cache for each Source. Like in the AbstractSpimSource implementation, a key of the cache should contain all the information to access a cached cell (timepoint, level, gridlocation). But I'm a bit lost in all the cache API (cache loader, cache, access flags...)
2.b. A global cache for all SourceAndConverter registered in bdv-playground
I was wondering whether it was possible to implement a single cache that would contain all the cached cells from all bdv-playground sources. Basically a key of this cache would identify:
This would allow to invalidate all caches at once if necessary and programmatically. Also, a bounded soft ref policy could be applied all at once and act upon all sources.
There is a potential issue memory leak issue, but I think it should be ok: if a SourceAndConverter object is kept in memory because of its remaining cached cells, the order guarantee on BoundedSoftRef should sooner or later manage to get rid of the remaining cells. Also, when a Source is removed from Bdv-Playground, we could make sure to invalidate its cached cells from the global cache.
So : do you think 2.b. is a good option ? If yes, I'll probably need a bit of help to start it, because as I wrote, I'm a bit lost in the API.
Optionally:
ping @tischi @tpietzsch