abria / TeraStitcher

A tool for fast automatic 3D-stitching of teravoxel-sized microscopy images
http://abria.github.io/TeraStitcher/
Other
78 stars 32 forks source link

Issue with ParaConverter (divide by 0) #49

Closed JeffreyStirman closed 5 years ago

JeffreyStirman commented 5 years ago

Hi. Thanks for the amazing code. I was running a comparison of Terconverter and Paraconverter as outlined here: https://github.com/abria/TeraStitcher/wiki/Multi-CPU-parallelization-using-MPI-and-Python-scripts

First running the 'regular' way c:\terastitcher\teraconverter -s="C:\Users\User\Downloads\tostitch\tostitch\xml_merging.xml" -d="C:\Users\User\Downloads\tostitch\tostitch\stitched\par" --sfmt="TIFF (unstitched, 3D)" --dfmt="TIFF (tiled, 2D)"

Everything ran fine.

When I ran the parallel one: mpiexec -n 10 python "C:\Users\User\Downloads\Paraconverter2.3.2.pyc" -s="C:\Users\User\Downloads\tostitch\tostitch\xml_merging.xml" -d="C:\Users\User\Downloads\tostitch\tostitch\stitched\par" --sfmt="TIFF (unstitched, 3D)" --dfmt="TIFF (tiled, 2D)"

I got this: ('The value for ', '--info', ' was not declared. It will be set to', 'no_info', 'by default.')


('2019-05-26 00:23:37.899000', ' -- Calculation started on ', 10, '- 1 cores.')


('The value for ', '--depth=', ' was not declared. It will be set to', 0, 'by default.') ('The value for ', '--height=', ' was not declared. It will be set to', 0, 'by default.') ('The value for ', '--width=', ' was not declared. It will be set to', 0, 'by default.') ('The value for ', '--resolutions=', ' was not declared. It will be set to', '0', 'by default.') ('The value for ', '--isotropic', ' was not declared. It will be set to', 'False', 'by default.') teraconverter --sfmt="TIFF (unstitched, 3D)" --dfmt="TIFF (tiled, 2D)" -s="C:\Users\User\Downloads\tostitch\tostitch\xml_merging.xml" -d=/ --info="c:\Python27/dims.txt" ('The value for ', '--origin=', ' was not declared. It will be set to', 'c:\Python27/dims.txt', 'by default.') ('Origin file is: ', 'c:\Python27/dims.txt') ('vxl_V, vxl_H, vxl_D, isotropic, h, max_res, max_res_D :', 0.35, 0.35, 5.0, False, 0, 0, 0) ################################################################################ ('Input file = ', 'C:\Users\User\Downloads\tostitch\tostitch\xml_merging.xml') ('Output directory', 'C:\Users\User\Downloads\tostitch\tostitch\stitched\par') ('Rough depth for the tiles in width direction = ', 0) ('Rough depth for the tiles in height direction = ', 0) ('Rough depth for the tiles in depth direction = ', 0) ('Source Format = ', '"TIFF (unstitched, 3D)"') ('Destination Format = ', '"TIFF (tiled, 2D)"') ('Resolutions = ', [0]) ('Max Resolutions', 0) ('Width (in voxel) of the immage = ', 13676) ('Height (in voxel) of the immage = ', 8927) ('Depth (in voxel) of the immage = ', 10) [] ('Last input elements of the original string = ', '') ################################################################################ Traceback (most recent call last): File "paraconverterX.py", line 1064, in File "paraconverterX.py", line 978, in create_commands File "paraconverterX.py", line 882, in create_sizes File "paraconverterX.py", line 810, in opt_algo ZeroDivisionError: float division by zero

And nothing happened. Any ideas what is wrong? This is on a windows machine

iannellog commented 5 years ago

Dear Jeffrey, in the current versions of the scripts the command line options --width --height --depth are mandatory when you launch Paraconverter. The error is caused by these missing options. I apologize, but we heve not managed well this issue in the current version.

Consider that in general the degree of parallelism that can be effectively exploited depends on the size of the dataset: it is not convenient to assign to MPI processors too few data and a tradeoff has to be found. So you should set --width --height --depth in such a way that the dataset can be paritioned in as many regions as are the MPI processors requested (better if you double this number to allow for some degree of load balancing).

I also noticed that you set the number of MPI processors to 10. Remember that if you specify -np 10, the parallelization degree actually used is 9 since one processor is reserved for dispatching only. If you want a parallelization degree of 10, you have to specify -np 11.

I hope this can help.

I also inform you that a new paper about TeraStitcher is going to be published on Frontiers in Neuroinformatics whose subject is precisely stitching parallelization (the abstract here: https://www.frontiersin.org/articles/10.3389/fninf.2019.00041/abstract). Please cite this paper in the future if you use our tools.

Best regards.

-- Giulio

JeffreyStirman commented 5 years ago

Thanks! And what does --width --heig ht --depth refer too? Pixel size or partitions? What would you suggest for 4x6 tiles, 2048x2048, 2600 slices, 48 core, 96 thread workstation?

Also in the alignment step, is there anyway to ignore the z offset (alignment) calculations and assume this is 0? I have good reason to do this.

Thanks!

iannellog commented 5 years ago

When you want to execute TeraConverter (or step 'merge' of TeraStitcher), you have to perform multiple write operations in parallel. Unless you interface a parallel file system (e.g. using MPI-IO) you cannot write data in the same file. For this reason TeraConverter supports this function only if the output format is one of the following: "TIFF (series, 2D)" "TIFF (tiled, 2D)" "TIFF (tiled, 3D)" "TIFF (tiled, 4D)" Indeed in these cases the final image is stored in multiple files that can be assigned to different instances of TeraConverter.

The --width --height --depth options are specified in pixels and declare which is the desired maximum size of the tiles. Our software uses these values to compute an optimal partition of the dataset for efficient parallelization. Note that when the "TIFF (series, 2D)" is chosen partition can be performed only along the Z axis (each slice is a single 2D file that cannot be written in parallel). The "tiled" format, conversely, perform partition along all axes. In other words, in the former case only the --depth option is actually used to perform dataset partition, whereas in the latter case all options may used.

Consider, however, that the "tiled" formats can be further manipulated only using either Vaa3D-TeraFly (for visualization and annotation), or a library derived from our code (for processing) which is not publicibly available yet.

For this reason I think it is likely you are actually interested in using the "TIFF (series, 2D)" format. In this case you can specify for --width --height a large value (larger than the final X-Y sizes) since no tiling has to be performed in X-Y. For the --depth option you should specify a sufficiently small value so that the dataset is partitioned at least in as many substacks as are the processors you are using.

Consider however that the whole task is highly I/O bound, and that it is likely useless to use too many processes because you could saturate the transfer rate of your I/O subsystem. From our experience if you have a conventional I/O subsystem it is useless to exploit more than 10-15 processors (this estimate may however change fron machine to machine). For these reasons I suggest you can try -np 13 (1 dispatcher and 12 processes) and a value of about 100 (or even less) for --depth. This way the dataset is paritioned in at least 26 substacks (or more) that are distributed on 12 processors with some load balancing effect.

I hope this can help you. I also succest to read our paper recently published on Frontiers in Neuroinformatics which can give you other useful information.

Best regards.

-- Giulio

JeffreyStirman commented 5 years ago

Thanks and great explanation!