coiled / dask-community

Issue tracker for the Dask community team
MIT License
2 stars 0 forks source link

Dask integration with napari plugins #178

Open scharlottej13 opened 2 years ago

scharlottej13 commented 2 years ago

Background

Napari, an open-source tool for browsing, annotating, and analyzing large multi-dimensional images, is already well-integrated with Dask and was one of the projects @GenevieveBuckley focused on during her year as a Dask life science fellow (more background on that here). One area for improvement Genevieve noted, was for better integration with Dask early on in project development, which was the impetus for me joining CZI’s napari Plugin Accelerator Kickoff in December 2021. Here are a few projects I think could be relevant for integrating with Dask.

Juan Nunez-Iglesias @ Monash University

  1. zarpaint: manually edit larger-than-RAM segmentations directly on disk. Solves the problem that zarr lacked fancy indexing (which is how napari paints to an array). Could integrate well with Dask because Dask is good at solving larger-than-memory problems. It seems relatively further along in development and there is proof of it already being using "in the wild". Genevieve is a contributor to the repo and could possibly do an intro.
  2. skan: automated skeleton generation and analysis, first used to compare images of cytoskelton from malaria-infected vs. healthy red blood cells. Extra exciting because it has also been used in nuclear materials research and can potentially be used for other problems (e.g. roads, rivers, cracks in materials, etc.). One of their project goals is explicitly to add support for Dask arrays.

Chris Havlin @ University of Illinois

  1. yt-napari plugin: yt is used for analysis and visualization of volumetric datasets (mostly for astrophysical simulation). The plugin hopes to not only support interactive visualization, but also handle validation and ingestion of complex datasets. A prototype exists, with a stable release targeted for June 1st, 2022 (more details here).

Virginie Uhlmann @ European Bioinformatics Institute

  1. splineit: allows interactive spline shaping in napari (think moving a squiggly line in paint), "allows curating segmentation results and prepare training sets in a 'vector graphics' manner" (more details here). I think there is potential for integration with Dask because one current issue they're facing is scalability-- with more than 200 layers, things get very slow and a better data structure is needed for interactive layers.
GenevieveBuckley commented 2 years ago

Brief comments:

Note: the napari plugin accelerator grant program is 6 months, which is why many projects have release dates targeted for June 2022. I guess it's possible CZI might do some no cost grant extensions, but for best results interacting with these groups in the first half of this year would be most productive.

jni commented 2 years ago

Hi @scharlottej13, thanks for this writeup and pleased to meet you! I agree with @GenevieveBuckley's comments; zarpaint in particular relies on writing to bigger-than-RAM arrays, so it's not well suited to dask at the moment. @abigailmcgovern is working on using the painted arrays to train pytorch networks, and that could work very well with dask.

Re skan, a lot of the work that @GenevieveBuckley and I did might be obsolete soon thanks to a brand new, NumPy-only way to create graphs, which is in skan main branch and which is copied wholesale from this PR to scikit-image. I haven't played around yet with whether dask works out of the box with this approach but it seems to me like it should be easier than the numba-based approach from earlier.

GenevieveBuckley commented 2 years ago

@AbigailMcGovern is working on using the painted arrays to train pytorch networks, and that could work very well with dask.

Ooh yeah, this would be a great project for Dask engagement.