Closed mzur closed 6 years ago
Here is what I found:
libvips looks really great for working with huge images and images in some scientific format. It can also replace GD (or planned imagick #70) while being more memory efficient! It performs tremendously better than GD for extracting image patches. It may be hard to install on Solaris, though. Maybe on our new Linux machine? There is a libvips PHP extension and bindings.
I took a 43952x98748 px tissue slide scan as TIFF to experiment with vips. It's trivial to generate tiles from the TIFF that can be displayed by OpenLayers. Command to generate tiles in Zoomify format:
vips dzsave source.tif target_dir --layout zoomify
Minimal example to display the tiles with OpenLayers:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Zoomify</title>
<link rel="stylesheet" href="https://openlayers.org/en/v4.1.0/css/ol.css" type="text/css">
<script src="https://openlayers.org/en/v4.1.0/build/ol.js"></script>
</head>
<body>
<div id="map" class="map"></div>
<script>
var imgWidth = 43952;
var imgHeight = 98748;
var source = new ol.source.Zoomify({
url: 'http://localhost:8000/target_dir/',
size: [imgWidth, imgHeight],
crossOrigin: 'anonymous'
});
var extent = [0, -imgHeight, imgWidth, 0];
var map = new ol.Map({
layers: [
new ol.layer.Tile({
source: source
})
],
target: 'map',
view: new ol.View({
// adjust zoom levels to those provided by the source
resolutions: source.getTileGrid().getResolutions(),
// constrain the center: center cannot be set outside this extent
extent: extent
})
});
map.getView().fit(extent);
</script>
</body>
</html>
The Zoomify format includes an ImageProperties.xml
which looks like this:
<IMAGE_PROPERTIES WIDTH="43952" HEIGHT="98748" NUMTILES="88624" NUMIMAGES="1" VERSION="1.8" TILESIZE="256"/>
If an image file exceeds a certain size, Biigle can automatically generate Zoomify tiles for it. If the image file is requested through the /file
API endpoint it can deliver the XML instead of the image. The annotation tool then switches it's rendering to "tile mode". Alternatively we could do this for any image no matter their size. This way we would have a single implementation that works all the time. But this inflates the required storage space as any image is effectively duplicated.
Comments to the thoughts from above:
The rendering pipeline in the annotation tool doesn't work with tiled images yet.
It's easy to make OpenLayers work with a Zoomify source. It just complicates things if we have to dynamically switch between tiled sources and single files.
Some features in the annotation tool don't work with tiled images (color adjustment,
magic wand).
The color adjustment needs to be reimplemented for tiled sources. If done correctly this might even speed it up.
(Largo) Annotation patch extraction doesn't work with tiled images.
If we have the source of the large image as single file it's very easy and efficient to extract patches from it.
How to do the pre-loading of previous/next images?
Pre-loading might not be necessary as loading only the required tiles is very fast. Switching images may not be immediate, though.
Thumbnail generation doesn't work
Vips has thumbnail generation built-in.
The color sort module doesn't work.
Maybe this can be implemented with libvips arithmetic operations.
The laser point detection doesn't work.
Either we have to invest some serious work into making the LP detection more efficient or we disable it for large images. For mosaics it probably wouldn't make much sense anyway. What would make sense is reading the px to m ratio from a file.
Some thoughts from talking with the people at Geomar:
Preliminary plan of action:
HEAD
requests (use Guzzle). If a file is too big, don't create the volume. The allowed file size is configured in config/image.php
. 0.5 daystiled
attribute to the image model. This is set to true if the image is not remote and larger than certain dimensions. The dimensions are configured in config/image.php
. 0.5 daystiled
set to true
. The tiles are created in Zoomify format with Vips and stored to the storage/tiles/{uuid}
directory. The uuid
is the UUID of the image. The path can be configured in config/image.php
. The directory should be publicly accessible through /tiles
so the tiles can be loaded fast. 1 dayImageProperties.xml
instead of the image for tiled images. Update the annotation tool so it can display a tiled image. Disable all features in the annotation tool that don't work out of the box for a tiled image. Cache the XML instead of the file for the image. 2 daysbiigle/copria
and biigle/copria-thumbnails
because vips is so fast that we no longer need them.~ 9 days of work (which currently are about 4 weeks)
I'm now using the docker
branch as base for this. I configured the worker
container to have vips available so I don't have to install it on my machine. I can run the tests with:
docker run --rm -t -v $(pwd):/app --entrypoint="" -w="/app" biigle/worker-dev php -d memory_limit=1G vendor/bin/phpunit
This is quite nice, actually, as the tests are run in the same environment as the app would eventually run. Instead of the biigle/worker-dev
of my local Docker compose build, we could use a biigle/worker
production image later.
I've now implemented a more generic image caching solution that will work even for very large images. This is used during thumbnail and annotation patch generation. It took half a day longer than planned but I think it's worth it.
Instead of serving the ImageProperties.xml
I'm now storing the image dimensions to the image attrs
JSON attribute if it is a tiled image. The /file
endpoint will then serve a JSON containing dimensions and UUID of the image.
We want to support exploration and annotation of large images. This can be huge tissue slide scans or stitched together mosaics of a transect. The usual way to do this is to extract tiles from the image at different zoom levels. Depending on the viewport and zoom only a subset of these tiles is loaded and displayed at any given time.
As we are using OpenLayers to display the images this functionality is already build in. But we have to evaluate how to implement this in the server side application. Do we want to extract the tiles from the image ourselves? Or do the users have to provide already correctly tiled images? How do we distinguish between regular images and tiled images (flag, new db table, etc.)?
Think of a strategy to implement this and evaluate the amount of work we would have to invest.
Thoughts:
magic wand).