Improve LOCI stitching quality of life

ctrueden commented 5 years ago

LOCI performs multi-tile image acquisitions from multiple types of systems:

OWS (WiscScan) OME-TIFFs
OpenScan OME-TIFFs
PrairieView TIFFs/OME-TIFFs

Our scientists are having difficulty stitching these mosaics post-acquisition:

Over the last few weeks, I have been collecting mosaic tiles to create a stitched image of a piece of whole brain tissue. I've been using both ultima (PrairieView, writes to .tif) and OWS (WiscScan, writes to .ome.tiff) and have faced some challenges. I am able to stitch using Grid/Collection stitching as normal with positions from file/metadata fine on my PC for files that are 3GB up to 8 GB on data that is from the Ultima (.tif). Michael and I have been working together to try and get the BigStitcher plug-in to work for my larger files but we're having some trouble. First issue: BigStitcher is incredibly slow to load or won't load at all. If it does actually generate a stitched image, the image is very noisy and weird looking (pasted below). Michael has been in correspondence with the developers through GitHub, but it's being too slow/not working/using up more RAM than we think it should. It not only won't work on our computers, but it will not work on the 192 GB computer downstairs. Second issue: The .ome.tiff files from WiscScan. FIJI is having trouble reading these files for stitching and I thus far have not been able to stitch an image from the data I've collected on OWS. We can't get it to work with BigStitcher, but we need to go in and manually declare grid size, pixel size, etc. Whether using BigStitcher or just regular Grid/Collection stitching, FIJI is struggling with the .ome.tiff file type. I have some files that are 17-35 GB that neither my PC nor the 192 GB can process through either tool.

@mpinkert adds:

1. The WiscScan data's problem is an issue where grid/collection stitching outputs a single image tile, and cannot read the metadata or stitch. You get the following warning : ``` [WARN] Could not transform version 2012-06 OME-XML. ``` 2. The real speed problem with BigStitcher is image fusion; loading the image files is slow, but manageable. On the other hand, fusing an image that takes 5 minutes to stitch in Grid/Collection took me 7.5 hours to do in BigStitcher. In general BigStitcher is very slow for large numbers of tiles, and it makes it almost unusable for our grid applications.

This issue is a catch-all while I work to troubleshoot these problems and improve the experience of ImageJ-based stitching with LOCI-collected data. Will follow up in the comments with specifics, subtasks and action items.

ctrueden commented 5 years ago

My initial investigations on Thursday discovered the following problems:

OME-XML in the OWS/WiscScan OME-TIFFs does not validate due to use of an InProgress (development/intermediate) 2012-06 OME schema. Changing the schema declaration to the 2012-06 release version instead results in successful validation of the OME-XML. And XSLT transformation of the InProgress 2012-06 XML to 2013-06 works (both from xsltproc on the CLI as well as from Java via Bio-Formats's XMLTools.transformXML. Furthermore, WiscScan is on its way out in favor of OpenScan. Therefore, fixing this issue (either by updating the version of Bio-Formats that WiscScan embeds, or by hacking the XML post-acquisition) is not a priority.
OME-XML in the OWS/WiscScan OME-TIFFs does not successfully forward-transform from 2012-06 into 2016-07. It fails when transforming 2015-01 to 2016-06—but only from Java, not with xsltproc. I pushed a preliminary fix to ctrueden/bioformats@14a1c30b4989e16df7ac59d7982efca0a987dd6d but I want to boil it down to an MCVE before filing a Bio-Formats issue or PR.
Once (2) is addressed: the grid/collection stitching of the OWS/WiscScan OME-TIFFs has some addressable performance bottlenecks. Preliminary work in ctrueden/bioformats@d2e9cfe95e0b52bbdaddc52ea8f5c313ac3a262e, but fixes need to be split and finalized and filed as PRs or issues.
The grid/collection stitching plugin uses the BF.openImagePlus routine to open every tile as an ij.ImagePlus. Some of the performance issues in (3) are related to this. But beyond only performance, there is likely to be an irreproducibility/correctness issue, since BF.openImagePlus will reuse the last-used user settings from the Bio-Formats Importer dialog box. In particular, if "Group files with similar names" was checked, it may still be enabled during a later stitching operation, which will have ramifications on behavior and performance. As a workaround, we should consider adding options.setGroupFiles( false ); (and maybe lock down other options similarly) within the Stitching_Grid code.
The BigStitcher fusion implementation is quite elegant and maintainable—but it's slow because the fusion algorithm has quadratic time complexity w.r.t. the number of tiles in the acquisition. Since LOCI is now acquiring mosaics with >500-3000 tiles, this is starting to kill us. We will need to consider implementing a linear-time fusion algorithm.
The existing quadratic BigStitcher fusion algorithm has some places where minor speedups may be possible; first cut at PreibischLab/multiview-reconstruction@c84dc6db2d19d6865898c3b973dbb506dd7f5eef. But these fixes are likely to provide a 10-20% boost at best—not enough to counter the quadratic time growth described in (5).

ctrueden commented 5 years ago

After meeting with @mpinkert, we settled on the following action items to resolve this issue:

Bio-Formats / Stitching

[ ] Develop a Stitching benchmarking script. Benchmark Stitching performance fixes so far. This is a prerequisite to filing PRs for Bio-Formats, to defend proposed changes.
[ ] Create an MCVE of the XSLT transformation failure of OME-TIFF metadata from Java. Follow up on the BIo-Formats issue tracker with a PR, or at least an issue (mention or improve ctrueden/bioformats@14a1c30b4989e16df7ac59d7982efca0a987dd6d).
[ ] Clean up my Bio-Formats performance fixes (ctrueden/bioformats@d2e9cfe95e0b52bbdaddc52ea8f5c313ac3a262e) and submit a PR.
[ ] Consider filing a PR to Stitching with setGroupFiles(false) (and maybe other settings locked down) all the time—but first evaluate whether doing so is truly correct.
[ ] Analyze performance and correctness of Stitching with the above fixes in place. Follow up on any additional bottlenecks and bugs.

BigStitcher

[x] Develop a BigStitcher benchmarking script. Benchmark BigStitcher performance fixes thus far. This is a prerequisite to filling PRs in BigStitcher, to defend proposed changes.
[x] Scope what it would take to implement a faster algorithm for BigStitcher fusion. If easy, finish it; if more involved, delegate to another LOCI programmer, or file an issue in multiview-reconstruction.
[ ] Analyze performance and correctness of BigStitcher with the above fixes in place. Follow up on any additional bottlenecks and bugs.

Miscellaneous

[x] Ask Alex whether there are any current roadblocks with data collected from PrairieView, and if so, to provide more information about that, with MCVE.
[x] Analyze Jayne's problematic OWS dataset (multiple filter wheels) does not import all image planes to ImageJ by default. This is a separate issue from the stitching woes, now tracked separately as #13.

We decided not to worry about upgrading Bio-Formats in WiscScan to a release version, to avoid InProgress schema usage when writing OME-TIFF, since the XSLT transformation does not appear to be getting hung up on that.

ctrueden commented 5 years ago

Discussion of wrapped image iteration performance on Gitter today:

ctrueden: Any time wrapped images are passed around and iterated more than once, it causes that pain. axtimwalde: i have a solution for this axtimwalde: that I am using regularly ctrueden: My solution thus far has been “make a copy" axtimwalde: kind of but transparently ctrueden: Cache sample calculations? axtimwalde: i use imglib2-cache to cache a randomaccessibleinterval ctrueden: Nice. axtimwalde: that sounds pointless but is exactly what you want ctrueden: It’s cool. As long as the calculation isn’t dynamic enough to change later. axtimwalde: making the RAI can be expensive if a lot of transforms are involved axtimwalde: but you also may not want to copy the whole thing or don't even have enough memory to do so

See:

If we find that fused images are being iterated more than once, this technique could improve performance.

StephanPreibisch commented 5 years ago

I am already using the same kind of caching in BigStitcher already ... but maybe there are more layers where we could add caching. I'll try ...

ctrueden commented 5 years ago

@mpinkert and I have completed a proof-of-concept tile-by-tile fusion implementation on the tile-by-tile-fusion branch of multiview-reconstruction. Single-threaded for now, so slower on small numbers of tiles. But we are hoping it scales better when the number of tiles is large.

Steps to test:

Build the code. Check out the tile-by-tile-fusion branch, then run mvn clean install with Java 11 active.
Install the code. Copy target/multiview-reconstruction-0.3.4-SNAPSHOT.jar into your Fiji.app/plugins folder, renaming it to multiview_reconstruction-0.3.4-SNAPSHOT.jar and deleting the old multiview_reconstruction-0.3.3.jar.
Compare performance using the macro below, changing xmlFile to a many-tiled dataset.

xmlFile = "/Users/curtis/data/3x3/3x3.xml"

// New algorithm.
print("======= BEGINNING NEW ALGORITHM =======");
startN = getTime();
run("Fuse dataset ...", "select=" + xmlFile + " process_angle=[All angles] process_channel=[All channels] process_illumination=[All illuminations] process_tile=[All tiles] process_timepoint=[All Timepoints] bounding_box=[Currently Selected Views] downsampling=1 pixel_type=[32-bit floating point] interpolation=[Linear Interpolation] image=Virtual interest_points_for_non_rigid=[-= Disable Non-Rigid =-] blend produce=[Each timepoint & channel] fused_image=[Tile-by-tile O(n) proof of concept]");
endN = getTime();
print("======= NEW ALGORITHM RESULT =======");
print("New algorithm: " + (endN - startN) + " ms elapsed");

// Old algorithm.
print("======= BEGINNING OLD ALGORITHM =======");
startO = getTime();
run("Fuse dataset ...", "select=" + xmlFile + " process_angle=[All angles] process_channel=[All channels] process_illumination=[All illuminations] process_tile=[All tiles] process_timepoint=[All Timepoints] bounding_box=[Currently Selected Views] downsampling=1 pixel_type=[32-bit floating point] interpolation=[Linear Interpolation] image=Virtual interest_points_for_non_rigid=[-= Disable Non-Rigid =-] blend produce=[Each timepoint & channel] fused_image=[Display using ImageJ]");
endO = getTime();
print("======= OLD ALGORITHM RESULT =======");
print("Old algorithm: " + (endO - startO) + " ms elapsed");

print("======= COMPARISON =======");
print("New algorithm: " + (endN - startN) + " ms elapsed");
print("Old algorithm: " + (endO - startO) + " ms elapsed");

StephanPreibisch commented 5 years ago

Hi @ctrueden, thanks for contributing! As mentioned, @hoerldavid tried something quite similar, which is on master: https://github.com/PreibischLab/BigStitcher/blob/master/src/main/java/net/preibisch/stitcher/plugin/Fast_Translation_Fusion.java We should compare ... I am currently implementing a more efficient affine fusion that scales beyond thousands of tiles. Update on this coming soon,

mpinkert commented 5 years ago

I have run a few tests using our new script. I will attempt a benchmark using @hoerldavid's algorithm soon, but here are some results so far:

5 MB / 9 512x512 tiles dataset New algorithm: 1,068 ms elapsed Old algorithm: 761 ms elapsed
Grid/Collection algorithm: 5,310 ms elapsed

350 MB / 186 512x512x3 tiles dataset New algorithm: 31,688 ms elapsed Old algorithm: 134,619 ms elapsed Grid/Collection algorithm: 15,921 ms elapsed

11 GB / 396 1024x1024x11 tiles dataset New algorithm: 1,321,961 ms elapsed Old algorithm: 3,076,533 ms elapsed Grid/Collection algorithm: 27,908 ms elapsed

mpinkert commented 5 years ago

Here's the comparisons with fast fuse included. I had to downsample the 11 GB example by 2 because the Fast fuse algorithm ran out of memory. The fast fusion seems much faster than the original BigStitcher algorithm or our rough try, so if the memory issue can be solved it may work well for our purposes.

I did also notice that the old algorithm (standard BigStitcher) is very CPU intensive - I was running at 99% CPU the whole time, so that's likely a big limiting factor.

350 MB / 186 512x512x3 tiles dataset New algorithm: 34079 ms elapsed Old algorithm: 148059 ms elapsed Fast fuse algorithm: 11290 ms elapsed Grid/Collection algorithm: 10173 ms elapsed

11 GB / 396 1024x1024x11 tiles dataset - Downsampled 2x New algorithm: 187,306 ms elapsed Old algorithm: 724,467 ms elapsed Fast fuse algorithm: 45,838 ms elapsed Grid/Collection algorithm: 8,253 ms elapsed

ctrueden commented 5 years ago

@mpinkert Glad to hear that the fast fusion method is faster. How did you enable it? Is it accessible via the GUI? Or did you write code?

One other update: the reason (2) is happening from ImageJ is because Fiji does not include the Xalan library. Bio-Formats uses Xalan for the XSLT transformations. The standard Java library supports that same API, so in theory Xalan should not be necessary, but apparently the standard Java implementation has a bug that Xalan does not have, which ends up causing a problem here. Adding xalan-2.7.2.jar and serializer-2.7.2.jar to Fiji.app/jars is highly likely to make the "Could not transform version 2012-06 OME-XML" errors go away.

mpinkert commented 5 years ago

@ctrueden I enabled it as a macro command and separate plugin menu option by adding the line Plugins>BigStitcher>Batch Processing, "Fast fuse dataset ...", net.preibisch.stitcher.plugin.Fast_Translation_Fusion to plugins.config for BigStitcher.

ctrueden commented 3 years ago

@MichaelSNelson and I just checked whether the Fast_Translation_Fusion is available via the BigStitcher UI yet, and it appears still no, based on my inspection of the source. But you can run it as a one-liner Groovy script as follows:

ij.IJ.runPlugIn("net.preibisch.stitcher.plugin.Fast_Translation_Fusion", "");

ctrueden / tasks