Open GenevieveBuckley opened 2 years ago
Link to the data: https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ (linked to from this earlier blogpost about image loading)
Link to the PSF: https://drive.google.com/drive/folders/13udO-h9epItG5MNWBp0VxBkKCllYBLQF (discussed here)
We will use image data generously provided by Gokul Upadhyayula at the Advanced Bioimaging Center at UC Berkeley and discussed in this paper (preprint), though the workloads presented here should work for any kind of imaging data, or array data generally.
Nope, it doesn't work. Previously, we had hoped that the simpler application of itk's richardson lucy deconvolution would now be possible with map_blocks. Unfortunately, it looks like there is still some kind of problem blocking it from happening.
deconvolved = da.map_blocks(itk.richardson_lucy_deconvolution_image_filter, imgs, kernel=psf, iterations=1, dtype=np.float32)
I wish I had a good error message or something to add here. However, whatever the problem is only happens during the compute call, so I'm not getting anything useful out to help with debugging.
I'm pretty sure the problem is not related to the leading dimensions, because I can use np.expand_dims on regular numpy array input to itk.richardson_lucy_deconvolution
and that runs ok.
I don't think I have more time to play with this, sadly.
cc @thewtex
@GenevieveBuckley yes, absolutely, great plan!
Will be you at SciPy US this year to hash this out? :walking_woman: :handshake:
@PranjalSahu built on your work, made a few fixes:
https://www.youtube.com/watch?v=CMmoa8pP_eo
These are now published in itk-5.3rc4.post1
(published yesterday). TBD if the issue you mentioned still occurs.
@GenevieveBuckley yes, absolutely, great plan!
Well, I don't know if there is a plan anymore, since the current situation for me is "it doesn't work and I can't summarize why". I guess having this issue thread discussion is better than nothing, though.
Will be you at SciPy US this year to hash this out? πΆββοΈ π€
Nope! (I was recently in the US, but didn't realize at the time that would have meant a better opportunity to videochat while in the same timezone. Oh well, we could still videochat, it's just a lot more inconvenient)
@PranjalSahu built on your work, made a few fixes:
Very cool! I will have to check the talk out π
These are now published in
itk-5.3rc4.post1
(published yesterday). TBD if the issue you mentioned still occurs.
We'll have to check, this could be promising!
These are now published in
itk-5.3rc4.post1
(published yesterday). TBD if the issue you mentioned still occurs.We'll have to check, this could be promising!
Update: Unfortunately it looks like there are still problems. I see this error when I try to compute the deconvolution with itk
import itk
import numpy as np
from dask.array.image import imread
imgs = imread('raw-488nm/*.tif').astype(np.float32)
psf = imread('psfs_z0p1/PSF_488nm_dz100nm.tif').astype(np.float32)
deconvolved = da.map_blocks(itk.richardson_lucy_deconvolution_image_filter,
imgs,
kernel_image=psf,
number_of_iterations=1,
dtype=np.float32)
output = deconvolved[100, ...].compute()
Traceback:
2022-07-12 17:55:14,883 - distributed.nanny - WARNING - Restarting worker
2022-07-12 17:55:16,364 - distributed.nanny - WARNING - Restarting worker
2022-07-12 17:55:17,415 - distributed.nanny - WARNING - Restarting worker
2022-07-12 17:55:20,493 - distributed.nanny - WARNING - Restarting worker
---------------------------------------------------------------------------
KilledWorker Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 output = deconvolved[100, ...].compute()
2 print(output.shape)
File ~\.conda\envs\itk-dask\lib\site-packages\dask\base.py:314, in DaskMethodsMixin.compute(self, **kwargs)
290 def compute(self, **kwargs):
291 """Compute this dask collection
292
293 This turns a lazy Dask collection into its in-memory equivalent.
(...)
312 dask.base.compute
313 """
--> 314 (result,) = compute(self, traverse=False, **kwargs)
315 return result
File ~\.conda\envs\itk-dask\lib\site-packages\dask\base.py:602, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
599 keys.append(x.__dask_keys__())
600 postcomputes.append(x.__dask_postcompute__())
--> 602 results = schedule(dsk, keys, **kwargs)
603 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\client.py:3000, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
2998 should_rejoin = False
2999 try:
-> 3000 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
3001 finally:
3002 for f in futures.values():
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\client.py:2174, in Client.gather(self, futures, errors, direct, asynchronous)
2172 else:
2173 local_worker = None
-> 2174 return self.sync(
2175 self._gather,
2176 futures,
2177 errors=errors,
2178 direct=direct,
2179 local_worker=local_worker,
2180 asynchronous=asynchronous,
2181 )
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\utils.py:336, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
334 return future
335 else:
--> 336 return sync(
337 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
338 )
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\utils.py:403, in sync(loop, func, callback_timeout, *args, **kwargs)
401 if error:
402 typ, exc, tb = error
--> 403 raise exc.with_traceback(tb)
404 else:
405 return result
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\utils.py:376, in sync.<locals>.f()
374 future = asyncio.wait_for(future, callback_timeout)
375 future = asyncio.ensure_future(future)
--> 376 result = yield future
377 except Exception:
378 error = sys.exc_info()
File ~\.conda\envs\itk-dask\lib\site-packages\tornado\gen.py:769, in Runner.run(self)
766 exc_info = None
768 try:
--> 769 value = future.result()
770 except Exception:
771 exc_info = sys.exc_info()
File ~\.conda\envs\itk-dask\lib\site-packages\distributed\client.py:2037, in Client._gather(self, futures, errors, direct, local_worker)
2035 exc = CancelledError(key)
2036 else:
-> 2037 raise exception.with_traceback(traceback)
2038 raise exc
2039 if errors == "skip":
KilledWorker: ("('richardson_lucy_deconvolution_image_filter-getitem-21573412bfdc1b613ca3e9d5e79d94bb', 0, 0, 0)", <WorkerState 'tcp://127.0.0.1:58059', name: 1, status: closed, memory: 0, processing: 1>)
I will also test it on my system.
I tested it on Linux and the computation could finish for this code:
output = deconvolved[5, ...].compute()
I tested it locally. Did you perform computation on some remote server?
Ok I can reproduce the error when using LocalCluster with Dask. It is the same issue of import.
ImportError: PyCapsule_Import could not import module "_ITKCommonPython" 2022-07-12 14:18:39,986 - distributed.core - ERROR - PyCapsule_Import could not import module "_ITKCommonPython"
@GenevieveBuckley Ok I got it to work. I still don't know the root cause but I did something similar before. Please try doing this.
def mymethod(*args, **kwargs):
import itk
output = itk.richardson_lucy_deconvolution_image_filter(*args, **kwargs)
return output
deconvolved = da.map_blocks(mymethod,
imgs,
kernel_image=psf,
number_of_iterations=1,
dtype=np.float32)
output = deconvolved[10, ...].compute()
print(output.shape)
Oh, that's weird - if you import itk within the function it works for you, but not if itk is imported outside of it? That's very odd. I'll give that a go (tomorrow, probably)
I have tested it on entire dataset using 2 workers each with 32 GB limit. The workers are restarting when I perform compute on all 50 images (probably memory leak or in-sufficient memory). I am able to run it with 25 images like this
output = deconvolved[10:35, ...].compute()
Also I see some "ConnectionResetError: [Errno 104] Connection reset by peer" errors. But no computation fails.
This is the dask dashboard for this computation.
@GenevieveBuckley Ok I got it to work. I still don't know the root cause but I did something similar before. Please try doing this.
def mymethod(*args, **kwargs): import itk output = itk.richardson_lucy_deconvolution_image_filter(*args, **kwargs) return output
deconvolved = da.map_blocks(mymethod, imgs, kernel_image=psf, number_of_iterations=1, dtype=np.float32) output = deconvolved[10, ...].compute() print(output.shape)
Confirming this, I also see it working if itk is imported inside the function passed to dask's map_blocks, but not for the same thing with the import located outside the function.
This PR by Matt McCormick could fix the problem: https://github.com/InsightSoftwareConsortium/ITK/pull/3494
Status: ITK PR #3494 has fixed the problem! We will wait for the next ITK release candidate (higher than the current v5.3rc03) to become available, and then write the blogpost.
It'll be much better to release the blogpost when readers can install the ITK pre-release and try things themselves (plus it'll be easier to write then).
Ideally, the blogpost should ideally have two examples:
import dask
import itk
from dask import delayed
if __name__== "__main__":
from dask.distributed import Client, LocalCluster
cluster = LocalCluster()
client = Client(cluster)
@delayed
def performsum(m):
return m.GetNumberOfPoints()
m = itk.Mesh[itk.F, 3].New()
a = performsum(m).compute()
print(a)
@thewtex emailed me recently, and I'm copying my reply into this thread so the information is shared publically here too :)
Hi Matt,
Thanks for sending that feedback. Sorry for my slightly delayed reply, I saw your email and needed to dig into my old files to answer properly.
I did do a little bit more work for the ITK + Dask blogpost example. It seems the last time I touched this was in August last year, so I'm a little hazy on the details now. It mostly worked, but with some problems, and I found it very time consuming to work on so I put it down again.
Github issue link: https://github.com/dask/dask-blog/issues/138
NOTEBOOKS I'm attaching three jupyter notebooks here. I can put them in a public gist and link to it from the github issue, if that might be helpful to share with others too.
EDIT: I've uploaded the notebooks to this public gist: https://gist.github.com/GenevieveBuckley/4ad4282038a9ec49e548898c78d3b590
The first notebook 'test-itk.ipynb' works perfectly using randomly generated data.
The second notebook "test-itk-realdata.ipynb" should work exactly the same as the first notebook, this time using the example data used in the last blogpost. Unfortunately, I saw an error: "RuntimeError: No suitable template parameter can be found." Something must be different between the real example data and the randomly generated stuff. I investigated this more, but can't quite remember what I found out about it.
The third notebook "itk-dask-blogpost,ipynb" was one where I was going to run the example with real data and then add explanatory text around it. I clearly didn't get very far. Oddly, the error in this notebook is different "AttributeError: 'itkRichardsonLucyDeconvolutionImageFilterIUS3IUS3' object has no attribute 'SetKernel'".
EXAMPLE DATA Github issue comment: https://github.com/dask/dask-blog/issues/138#issuecomment-1174626291
Link to the data: https://drive.google.com/drive/folders/13mpIfqspKTIINkfoWbFsVtFF8D7jbTqJ (linked to from this earlier blogpost about image loading)
Link to the PSF: https://drive.google.com/drive/folders/13udO-h9epItG5MNWBp0VxBkKCllYBLQF (discussed here) I still have the data sitting next to the notebooks, so I can re-run some things if you want to make changes. I'd only downloaded part of the data, that was one thing that took a long time. It was also the thing that made me think it would take a really long time to re-do the example with all the z-slices and all the laser wavelengths (right now I've just looked at the 488nm wavelength).
Here's a tree view of what data files I have on disk next to the notebooks:
.
βββ test-itk.ipynb
βββ test-itk-realdata.ipynb
βββ itk-dask-blogpost.ipynb
βββ psf/
β βββ PSF_560nm_dz100nm.tif
βββ psfs_z0p1/
β βββ PSF_488nm_dz100nm.tif
β βββ PSF_560nm_dz100nm.tif
β βββ PSF_642nm_dz100nm.tif
βββ raw-488nm/
β βββ ex6-2_CamB_ch0_CAM1_stack0000_488nm_0000000msec_0001291795msecAbs_000x_000y_000z_0000t.tif
β βββ ex6-2_CamB_ch0_CAM1_stack0001_488nm_0043748msec_0001335543msecAbs_000x_000y_000z_0000t.tif
β βββ
... etc.
@GenevieveBuckley amazing work!! I am excited to share our lessons learned and progress on the blog!
Regarding, the error on the second notebook:
an error: "RuntimeError: No suitable template parameter can be found."
There is a call:
itk.array_from_image
But the input turned out to not be an itk.Image
, which is why we receive this error. We can just remove the itk.array_from_image
or we can use np.asarray
to ensure a np.ndarray
and copy as needed.
Regarding the second / third notebook:
"AttributeError: 'itkRichardsonLucyDeconvolutionImageFilterIUS3IUS3' object has no attribute 'SetKernel'".
This is because kernel argument needs to be kernel_image=psf
instead of kernel=psf
.
I also found that we needed to explicitly extract the time dimension for the function in map_blocks
.
import dask.array as da
import itk
import numpy as np
iterations = 1
def deconvolve_timepoint(img, kernel_image, number_of_iterations):
timepoint = np.squeeze(img, axis=0)
deconvolved = itk.richardson_lucy_deconvolution_image_filter(timepoint,
kernel_image=kernel_image,
number_of_iterations=number_of_iterations)
return np.expand_dims(deconvolved, 0)
deconvolved = da.map_blocks(
deconvolve_timepoint,
imgs[:4,:,:,:],
kernel_image=psf,
number_of_iterations=iterations,
dtype=np.float32,
)
deconvolved
However, if compute()
is called on the entire time series, we will run out of memory. And, we have no way to effectively visualization the result.
So, maybe we should write a post or two as intermediaries to demonstrate / discuss writing out the original data as OME-Zarr? And the processed result as OME-Zarr? This would facilitate processing, memory usage, and visualization. What do you think?
a) Thank you! I did not realize I made a mistake and needed kernel_image=psf
(instead of kernel=psf
).
b) We still have a problem.
Yes, relying on np.array()
to convert the data from ITK back to numpy does work well in both cases. So that's a good workaround for itk.array_from_image()
not working for itkPyBufferPython
inputs. (Off topic: should there be an issue raised in ITK about this?)
No, I can't run .compute()
on the dask result array, because now there's a similar error just in the execution/runtime of the dask tasks: TypeError: Expecting argument of type itkImageD4 or itkImageSourceID4.
Do you have any idea why the ITK class of the deconvolved
variable result changed?
test-itk.ipynb
notebook with randomly generated data, print(type(deconvolved))
returned <class 'itk.'>
test-itk-realdata.ipynb
, the example code is the same except we load int the real image data and suddenly print(type(deconvolved))
returns <class 'itk.itkPyBufferPython.NDArrayITKBase'>
insteadc) I'm supportive, but not entirely sure what you have in mind when you say this:
So, maybe we should write a post or two as intermediaries to demonstrate / discuss writing out the original data as OME-Zarr? And the processed result as OME-Zarr? This would facilitate processing, memory usage, and visualization. What do you think?
(Off topic: should there be an issue raised in ITK about this?)
suddenly print(type(deconvolved)) returns <class 'itk.itkPyBufferPython.NDArrayITKBase'> instead
I should dig deeper, but this may be expected.
dask tasks: TypeError: Expecting argument of type itkImageD4 or itkImageSourceID4.
This can be addressed with the function that operates on a timepoint explicitly:
def deconvolve_timepoint(img, kernel_image, number_of_iterations):
timepoint = np.squeeze(img, axis=0)
deconvolved = itk.richardson_lucy_deconvolution_image_filter(timepoint,
kernel_image=kernel_image,
number_of_iterations=number_of_iterations)
return np.expand_dims(deconvolved, 0)
deconvolved = da.map_blocks(
deconvolve_timepoint,
imgs[:4,:,:,:],
kernel_image=psf,
number_of_iterations=iterations,
dtype=np.float32,
)
deconvolved
I'm supportive, but not entirely sure what you have in mind when you say this:
:+1: I'll work on more details.
Hi there, I know this is bit off topic and I'm sorry for that, but there has been a topic on Dask discourse asking for advice about using ITK with Dask. I thought you might be able to give some insights there too.
In this topic, there is a link to an ITK blog post announcing Dask compatibility, but I've not been able to really find what that meant.
@guillaumeeb thanks for the note. I followed up on the Dask Discourse thread.
Now that [release v5.3rc03 of ITK is available (which should include this PR), it would be good to do a follow up blogpost to this one about using Dask + ITK together.
The purpose of this would be:
The first step is re-running the code from the earlier blogpost with ITK v5.3rc03 or above and seeing whether that works or not. Then we write up whatever we find.
Here are the comments specifically discussing what should be included in a followup blogpost:
Links: