elgw / deconwolf

Deconvolution of widefield microscopy images and generation of point spread functions
GNU General Public License v3.0
42 stars 1 forks source link

Tiling does not work on Windows11 #77

Open mflamand opened 2 weeks ago

mflamand commented 2 weeks ago

Hi,

First let me say that I have been quite happy with the performance of deconwolf. I have tested it using sets of RNA FISH images with great results. I think it works better (and faster) than the blind deconvolution algorithms I was using before. congrats!

I believe I may have found a bug. When using the latest release (0.4.3) in Windows 11, I am unable use the tiling option to process images that are too large for my GPU memory. For example, if I try to launch a run (mock run with 3 iterations, --verbose 2), I get the following :

dw --iter 3 --tilesize 1024 --prefix tiling --gpu --verbose 2 .\CamK2a_AAV15_06_CY3.tif .\PSF.tif outFile: .\tiling_CamK2a_AAV15_06_CY3.tif, outFolder: .\ Settings: image: .\CamK2a_AAV15_06_CY3.tif psf: .\PSF.tif output: .\tiling_CamK2a_AAV15_06_CY3.tif log file: .\tiling_CamK2a_AAV15_06_CY3.tif.log.txt nIter: 3 nThreads for FFT: 16 nThreads for OMP: 16 verbosity: 2 background level: auto method: Scaled Heavy Ball + OpenCL (SHBCL2) metric: Idiv Stopping after 3 iterations overwrite: NO tiling, maxSize: 1024 tiling, padding: 20 XY crop factor: 0.001000 Offset: 5.000000 Output Format: 16 bit integer Scaling: Automatic Border Quality: 2 Minimal boundary artifacts FFT lookahead: 0 FFTW3 plan: FFTW_MEASURE Initial guess: Flat average

deconwolf: '0.4.3'

BUILD_DATE: 'Jun 22 2024' TIFF Backend: 'LIBTIFF, Version 4.6.0 Copyright (c) 1988-1996 Sam Leffler Copyright (c) 1991-1996 Silicon Graphics, Inc.' OpenMP: YES OpenCL: YES VkFFT: YES sizeof(int) = 4 sizeof(float) = 4 sizeof(double) = 8 sizeof(size_t) = 8

Image dimensions: 2048 x 2048 x 39

Reading .\PSF.tif PSF Z-crop [181 x 181 x 265] -> [181 x 181 x 77] PSF XY-crop [181 x 181 x 77] -> [161 x 161 x 77] Output: .\tiling_CamK2a_AAV15_06_CY3.tif(.log.txt) -> Divided the [2048 x 2048 x 39] image into 4 tiles Initializing .\tiling_CamK2a_AAV15_06_CY3.tif.raw to 0 Dumping .\CamK2a_AAV15_06_CY3.tif to .\CamK2a_AAV15_06_CY3.tif.raw (for quicker io)

-> Processing tile 1 / 4 PSF X-crop: Not cropping Deconvolving using shbcl2 (using inplace) Setting the background level to 0.010000 image: [1044x1044x39], psf: [161x161x77], job: [1204x1204x115] Found 2 CL platforms Found 1 CL devices Will use device 0 (first = 0) CL device #0 CL_DEVICE_TYPE=CL_DEVICE_TYPE_GPU CL_DEVICE_GLOBAL_MEM_SIZE = 17175150592 (17175 MiB) CL_DEVICE_NAME = NVIDIA RTX 2000 Ada Generation CL_DEVICE_VENDOR = NVIDIA Corporation CL_DRIVER_VERSION = 553.24 CL_DEVICE_EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=4099 Using VkFFT version 10304 Preparing for convolutions of size 1204 x 1204 x 115 Warning: Will write the VkFFT configuration in the current folder. Reason: Can not determine a suitable folder under Windows. vkFFT cache file: VkFFT_kernelCache_1204x1204x115.binary Initializing VkFFT for size 1204 x 1204 x 115 fimcl_fft_inplace VkFFTAppend (for in-place forward transform) .Creating weight map for boundary handling fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace Downloading real data 1204 x 1204 x 115 (166705840 floats) Start guess: FLAT fimcl_copy Iterating .fimcl_copy fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace ...fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace Iteration 1/ 3, Idiv=0.000e+00 .fimcl_copy fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace ...fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace Iteration 2/ 3, Idiv=0.000e+00 .fimcl_copy fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace ...fimcl_fft_inplace VkFFTAppend (for in-place forward transform) fimcl_convolve fimcl_copy fimcl_ifft_inplace Iteration 3/ 3, Idiv=0.000e+00 Downloading real data 1204 x 1204 x 115 (166705840 floats) Closing the OpenCL environment

The same is happening when processing using the CPU:

dw --iter 3 --tilesize 1024 --prefix tiling_cpu --verbose 2 .\CamK2a_AAV15_06_CY3.tif .\PSF.tif outFile: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif, outFolder: .\ Settings: image: .\CamK2a_AAV15_06_CY3.tif psf: .\PSF.tif output: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif log file: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif.log.txt nIter: 3 nThreads for FFT: 16 nThreads for OMP: 16 verbosity: 2 background level: auto method: Scaled Heavy Ball (SHB) metric: Idiv Stopping after 3 iterations overwrite: NO tiling, maxSize: 1024 tiling, padding: 20 XY crop factor: 0.001000 Offset: 5.000000 Output Format: 16 bit integer Scaling: Automatic Border Quality: 2 Minimal boundary artifacts FFT lookahead: 0 FFTW3 plan: FFTW_MEASURE Initial guess: Flat average

deconwolf: '0.4.3' BUILD_DATE: 'Jun 22 2024' TIFF Backend: 'LIBTIFF, Version 4.6.0 Copyright (c) 1988-1996 Sam Leffler Copyright (c) 1991-1996 Silicon Graphics, Inc.' OpenMP: YES OpenCL: YES VkFFT: YES sizeof(int) = 4 sizeof(float) = 4 sizeof(double) = 8 sizeof(size_t) = 8

Image dimensions: 2048 x 2048 x 39 Reading .\PSF.tif PSF Z-crop [181 x 181 x 265] -> [181 x 181 x 77] PSF XY-crop [181 x 181 x 77] -> [161 x 161 x 77] Output: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif(.log.txt) -> Divided the [2048 x 2048 x 39] image into 4 tiles Initializing .\tiling_cpu_CamK2a_AAV15_06_CY3.tif.raw to 0 Dumping .\CamK2a_AAV15_06_CY3.tif to .\CamK2a_AAV15_06_CY3.tif.raw (for quicker io)

-> Processing tile 1 / 4 PSF X-crop: Not cropping Deconvolving Setting the background level to 0.010000 image: [1044x1044x39], psf: [161x161x77], job: [1204x1204x115] Estimated peak memory usage: 5.8 GB creating fftw3 plans ... c2r plan ... c2r inplace plan ... r2c plan ... r2c inplace plan ... Exported fftw wisdom to fftw_wisdom_float_inplace_threads_16.dat Iteration 3/ 3, Idiv=0.000e+00

It seems that the program always exits after the first tile is processed. The Idiv value stays at = 0.000e+00 (no background signal?). So my guess is that it fails to properly read in the image.

I get the same issue on 2 systems (#1: Intel 14900k, RTX 2000 Ada 16Gb, 64Gb Ram; #2: AMD 5900X, RTX 3080 10Gb, 64Gb RAM). I can use tiling with both systems whit Ubuntu 24.04 (in CPU or GPU modes), but not with Windows11. Tiling also works under WSL-Ubuntu and MacOS 15.1 (Apple M3 pro 18Gb) in CPU mode. GPU mode on MacOS does not work for me(it hangs at "fimcl_convolve"), but I wasn't looking to use GPU mode on my MacBook anyway.

By the way, related to issue #75, I am able to use the GPU mode under windows 11 with out any problem when the image is cropped.

I have no problem using dw under Ubuntu for now. For convenience (the workstation also runs windows exclusive software) it would be great if the issue could be fixed/looked at in the future. I am happy to do some testing if needed.

Best, Mathieu

elgw commented 1 week ago

Hi!

I'm glad that you find the software useful :)

Thank you for taking the time to report these issues and finding.

At the moment I can't say when I have time to look at the windows specific issues, but they won't be forgotten.

Unfortunately there is less chance that I will get deconwolf to run smoothly on MacOS in the nearest future (I have no access to hardware and OpenCL not the best backend). Possibly I'll revise that when/if deconwolf switches to/adds a Vulkan backend for the GPU computations.

Cheers, Erik