Support large images with tiling

mzur commented 6 years ago

We want to support exploration and annotation of large images. This can be huge tissue slide scans or stitched together mosaics of a transect. The usual way to do this is to extract tiles from the image at different zoom levels. Depending on the viewport and zoom only a subset of these tiles is loaded and displayed at any given time.

As we are using OpenLayers to display the images this functionality is already build in. But we have to evaluate how to implement this in the server side application. Do we want to extract the tiles from the image ourselves? Or do the users have to provide already correctly tiled images? How do we distinguish between regular images and tiled images (flag, new db table, etc.)?

Think of a strategy to implement this and evaluate the amount of work we would have to invest.

Thoughts:

The rendering pipeline in the annotation tool doesn't work with tiled images yet.
Some features in the annotation tool don't work with tiled images (color adjustment, ~~magic wand~~).
(Largo) Annotation patch extraction doesn't work with tiled images.
How to do the pre-loading of previous/next images?
Thumbnail generation doesn't work (the Copria thumbnail service won't ever work!)
The color sort module doesn't work.
The laser point detection doesn't work.

mzur commented 6 years ago

Here is what I found:

libvips looks really great for working with huge images and images in some scientific format. It can also replace GD (or planned imagick #70) while being more memory efficient! It performs tremendously better than GD for extracting image patches. It may be hard to install on Solaris, though. Maybe on our new Linux machine? There is a libvips PHP extension and bindings.

I took a 43952x98748 px tissue slide scan as TIFF to experiment with vips. It's trivial to generate tiles from the TIFF that can be displayed by OpenLayers. Command to generate tiles in Zoomify format:

vips dzsave source.tif target_dir --layout zoomify

Minimal example to display the tiles with OpenLayers:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Zoomify</title>
    <link rel="stylesheet" href="https://openlayers.org/en/v4.1.0/css/ol.css" type="text/css">
    <script src="https://openlayers.org/en/v4.1.0/build/ol.js"></script>
  </head>
  <body>
    <div id="map" class="map"></div>
    <script>
      var imgWidth = 43952;
      var imgHeight = 98748;

      var source = new ol.source.Zoomify({
        url: 'http://localhost:8000/target_dir/',
        size: [imgWidth, imgHeight],
        crossOrigin: 'anonymous'
      });
      var extent = [0, -imgHeight, imgWidth, 0];

      var map = new ol.Map({
        layers: [
          new ol.layer.Tile({
            source: source
          })
        ],
        target: 'map',
        view: new ol.View({
          // adjust zoom levels to those provided by the source
          resolutions: source.getTileGrid().getResolutions(),
          // constrain the center: center cannot be set outside this extent
          extent: extent
        })
      });
      map.getView().fit(extent);
    </script>
  </body>
</html>

The Zoomify format includes an ImageProperties.xml which looks like this:

<IMAGE_PROPERTIES WIDTH="43952" HEIGHT="98748" NUMTILES="88624" NUMIMAGES="1" VERSION="1.8" TILESIZE="256"/>

If an image file exceeds a certain size, Biigle can automatically generate Zoomify tiles for it. If the image file is requested through the /file API endpoint it can deliver the XML instead of the image. The annotation tool then switches it's rendering to "tile mode". Alternatively we could do this for any image no matter their size. This way we would have a single implementation that works all the time. But this inflates the required storage space as any image is effectively duplicated.

Comments to the thoughts from above:

The rendering pipeline in the annotation tool doesn't work with tiled images yet.

It's easy to make OpenLayers work with a Zoomify source. It just complicates things if we have to dynamically switch between tiled sources and single files.
Some features in the annotation tool don't work with tiled images (color adjustment, ~~magic wand~~).

The color adjustment needs to be reimplemented for tiled sources. If done correctly this might even speed it up.
(Largo) Annotation patch extraction doesn't work with tiled images.

If we have the source of the large image as single file it's very easy and efficient to extract patches from it.
How to do the pre-loading of previous/next images?

Pre-loading might not be necessary as loading only the required tiles is very fast. Switching images may not be immediate, though.
Thumbnail generation doesn't work

Vips has thumbnail generation built-in.
The color sort module doesn't work.

Maybe this can be implemented with libvips arithmetic operations.
The laser point detection doesn't work.

Either we have to invest some serious work into making the LP detection more efficient or we disable it for large images. For mosaics it probably wouldn't make much sense anyway. What would make sense is reading the px to m ratio from a file.

mzur commented 6 years ago

Some thoughts from talking with the people at Geomar:

We could provide something like remote volumes for these kind of images as well. The images might be served by a geoserver in this case. There will be problems with thumbnail or annotation patch generation, though, as we don't have a single coherent image any more.
Usually mosaics come with a world file which makes it possible to put the mosaic on a world map. We can extract geo locations and px to m ratios from this file. This makes LP detection obsolete.
They "dream" of a world map where multiple mosaics are displayed on the correct locations. This would definitely be possible in Biigle. But I'm not sure if this is in the scope as Biigle is focussed on annotations.

mzur commented 6 years ago

Preliminary plan of action:

[x] Create a branch of biigle-core and biigle-largo that works with Vips. Implement a simple custom solution for image caching in Largo. 2 days
[x] Update handling of remote volumes. If a remote volume should be created, check file sizes with HEAD requests (use Guzzle). If a file is too big, don't create the volume. The allowed file size is configured in config/image.php. 0.5 days
[x] Add a tiled attribute to the image model. This is set to true if the image is not remote and larger than certain dimensions. The dimensions are configured in config/image.php. 0.5 days
[x] Implement a job that creates tiles for a new image that has tiled set to true. The tiles are created in Zoomify format with Vips and stored to the storage/tiles/{uuid} directory. The uuid is the UUID of the image. The path can be configured in config/image.php. The directory should be publicly accessible through /tiles so the tiles can be loaded fast. 1 day
[x] Update the image API endpoints so they return the ImageProperties.xml instead of the image for tiled images. Update the annotation tool so it can display a tiled image. Disable all features in the annotation tool that don't work out of the box for a tiled image. Cache the XML instead of the file for the image. 2 days
[x] Implement color adjustment for tiled images. 2 days Moved to its own issue because it is probably not very important (BiodataMiningGroup/biigle-annotations#74)
[x] Implement color sorting for volumes with tiled images. 1 day This already works because only the thumbnails are used for color sorting. Python works better, too because thumbnails can have different dimensions. I've just updated the module to work with our Docker setup.
[x] Disable the laser point detection for volumes with tiled images. The LP detection would theoretically work but is not optimized for huge images.
[x] ~~Retire biigle/copria and biigle/copria-thumbnails because vips is so fast that we no longer need them.~~ Moved to their own issues. (BiodataMiningGroup/biigle-copria#3, BiodataMiningGroup/biigle-copria-thumbnails#5)
[x] Display image thumbnail in image show view instead of the original file.

~ 9 days of work (which currently are about 4 weeks)

mzur commented 6 years ago

I'm now using the docker branch as base for this. I configured the worker container to have vips available so I don't have to install it on my machine. I can run the tests with:

docker run --rm -t -v $(pwd):/app --entrypoint="" -w="/app" biigle/worker-dev php -d memory_limit=1G vendor/bin/phpunit

This is quite nice, actually, as the tests are run in the same environment as the app would eventually run. Instead of the biigle/worker-dev of my local Docker compose build, we could use a biigle/worker production image later.

mzur commented 6 years ago

I've now implemented a more generic image caching solution that will work even for very large images. This is used during thumbnail and annotation patch generation. It took half a day longer than planned but I think it's worth it.

mzur commented 6 years ago

Instead of serving the ImageProperties.xml I'm now storing the image dimensions to the image attrs JSON attribute if it is a tiled image. The /file endpoint will then serve a JSON containing dimensions and UUID of the image.

biigle / core

Support large images with tiling #101