dask / dask-image

Distributed image processing
BSD 3-Clause "New" or "Revised" License
207 stars 45 forks source link

Example image data for dask-image #107

Open GenevieveBuckley opened 5 years ago

GenevieveBuckley commented 5 years ago

dask-image example datasets

We need some good example data for tutorials with dask-image.

This issue is a place for discussion and suggestions. If you have links, add them here!

Ideally this data should:

It would be nice to have

What we want to avoid:

EDIT: https://github.com/napari/napari/issues/316

Just saw this tweet announcing a human brain MRI at 100Β΅m isotropic resolution. This could be a very cool dataset to use as a napari demo. I suggest we use this issue to keep track of datasets that we could put in napari once we have proper data downloading. Please just edit the checklist below to add your preferred demo data.

* [ ]  100Β΅m resolution human brain: https://twitter.com/ComaRecoveryLab/status/1134436231775961088

* [ ]  10m resolution vegetation cover in Victoria: http://francois-petitjean.com/Research/MonashVegMap/info.php and https://labo.obs-mip.fr/multitemp/mapping-a-part-of-australia-at-10-m-resolution/

* [ ]  correlative superres https://www.biorxiv.org/content/10.1101/773986v1.abstract

* [x]  SARS-CoV2 in gut epithelium https://twitter.com/notjustmoore/status/1256232842755014656

* [ ]  developing sea squirt https://www.nytimes.com/2020/07/09/science/sea-squirts-embryos.html

* [ ]  mechanobiology of intestinal organoids https://twitter.com/XavierTrepat/status/1308026944349450241

* [ ]  tracking of particles on astral microtubules ([paper](https://www.biorxiv.org/content/10.1101/2020.06.17.154260v1), [tweet (😍)](https://twitter.com/the_Node/status/1341050276011237379)), could make a really neat demo for the tracks layer.

* [ ]  [Sentinel-2 1y Cloud optimised geotiff dataset](https://medium.com/sentinel-hub/digital-twin-sandbox-sentinel-2-collection-available-to-everyone-20f3b5de846e)

* [ ]  Calcium imaging in the Drosophila ellipsoid body ([2013](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3830704/) and [2015](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4704792/))

* [ ]  Janelia FlyLight data ([AWS](https://registry.opendata.aws/janelia-flylight/))

* [ ]  Fruit Fly Brain Observatory (FFBO) ([tweet](https://twitter.com/FlyBrainObs/status/1369496338266750977))

* [ ]  Janelia [Open Organelle datasets](https://openorganelle.janelia.org/)

* [ ]  CZBiohub [Open Cell](https://opencell.czbiohub.org/about)

* [ ]  [CoCo](https://cocodataset.org/#download) + [Voxel51 datasets](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html)

* [ ]  @tlambert03's lattice light sheet dataset used in the dask application post https://www.ebi.ac.uk/biostudies/studies/S-BSST435?query=Talley%20lambert

two cryo-ET datasets to add to the pile

There is also some more developing Tribolium embryos, mouse brain slices and mouse colon volumes:



... and this 3D cell tracking dataset is gorgeous! http://celltrackingchallenge.net/

C.elegans developing embryo Waterston Lab, University of Washington, Seattle, WA, USA Training dataset: http://data.celltrackingchallenge.net/training-datasets/Fluo-N3DH-CE.zip✱ (3.1 GB) Challenge dataset: http://data.celltrackingchallenge.net/challenge-datasets/Fluo-N3DH-CE.zip (1.7 GB)

Microscope: Zeiss LSM 510 Meta Objective lens: Plan-Apochromat 63x/1.4 (oil) Voxel size (microns): 0.09 x 0.09 x 1.0 Time step (min): 1 (1.5) Additional information: Nature Methods, 2008

GenevieveBuckley commented 5 years ago

The Cancer Genome Atlas Database might work: https://portal.gdc.cancer.gov/

There are some histology images there that could fit the requirements. They do have file formats that would probably need a third party library to read into python, but you can download individual images separately pretty easily.

GenevieveBuckley commented 5 years ago

The xarray examples (like this one, or this one) sometimes uses NetCDF climate data from the Climate Data Store. Website: https://cds.climate.copernicus.eu


GenevieveBuckley commented 4 years ago

Nick took a histology CC0 image and converted it to zarr - https://camelyon16.grand-challenge.org/Data/

Edit: updated link - https://camelyon17.grand-challenge.org/Data/

GenevieveBuckley commented 4 years ago

The landsat data might also be good https://landsat.gsfc.nasa.gov/data/

Some people have said they think landsat is CC0 licensed, but I haven't found that page on the website yet so we better double check.

Here's a wrapper around the API to make it easier: https://github.com/loicdtx/lsru

And from the napari discussions https://github.com/napari/napari/issues/408#issuecomment-511214119

More info on Landsat 8 can be found at: https://landsat.gsfc.nasa.gov/landsat-8/mission-details/ I used the https://github.com/loicdtx/lsru to order the imagery, folks can also download Landsat 8 with https://earthexplorer.usgs.gov/

mrocklin commented 4 years ago

cc @scottyhq who knows a bunch about landsat

(although, Scott is also a big xarray user, and we might want to avoid Xarray for this example in order to keep things focused on Dask Image)

scottyhq commented 4 years ago

Thanks @mrocklin. Yes, we've used landsat8 for some examples since it is a public dataset on AWS and Google Cloud. Here is a blog post with some background: https://medium.com/pangeo/cloud-native-geoprocessing-of-earth-observation-satellite-data-with-pangeo-997692d91ca2, or if you just want to take a look at a notebook: https://github.com/scottyhq/esip-tech-dive/blob/master/notebooks/0-demo-aws.ipynb. As mentioned, these examples demonstrate using xarray integrated with dask.

GenevieveBuckley commented 4 years ago

Thank you @scottyhq I'll take a look at those links and see if we can't get something up and running

GenevieveBuckley commented 4 years ago

@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data . They might or might not be good candidates, but I ALSO notice that there is no licence with that data. Is that an oversight?

GenevieveBuckley commented 4 years ago

More code from @timothywallaby, looping through a large image and appending to zarr: https://github.com/timothywallaby/dask/blob/master/OpenSlidetoZarr.ipynb

jni commented 4 years ago

@sofroniewn do you have the code you used to convert the Camelyon data to zarr?

GenevieveBuckley commented 4 years ago

@sofroniewn do you have the code you used to convert the Camelyon data to zarr?

The code from @sofroniewn is here: https://github.com/sofroniewn/image-demos/blob/master/helpers/make_2D_zarr_pathology.py

The instructions were not to use it as is until we can work out why the saved file is bigger than the original tiff. Personally I also feel that for this purpose we don't really need the multilevel hierarchy, so that might make things a bit simpler.

jakirkham commented 4 years ago

@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data .

@rxist525, what do you think? Would it be ok to use that data for code examples here?

rxist525 commented 4 years ago

@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data .

@rxist525, what do you think? Would it be ok to use that data for code examples here?


rxist525 commented 4 years ago

@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data . They might or might not be good candidates, but I ALSO notice that there is no licence with that data. Is that an oversight?

you are welcome to use the data, which is part of a recent publication.

GenevieveBuckley commented 4 years ago

Is there a license for that dataset @rxist525?

rxist525 commented 4 years ago

Is there a license for that dataset @rxist525?

good question, let me check and get back.

GenevieveBuckley commented 4 years ago

Also a potentially useful discussion: https://github.com/thewtex/fiber-bed-zarr/issues/1#issuecomment-595984988

jakirkham commented 3 years ago

Is there a license for that dataset @rxist525?

good question, let me check and get back.

Just got off a call with Gokul earlier, he mentioned they've now added the CC BY-SA 4.0 license with the data. Though are potentially open to changing it if it causes issues. Feel free to correct me Gokul if needed.

rxist525 commented 3 years ago

Apologies for dropping the ball on this - thanks for adding the note!

On Mon, Nov 16, 2020 at 2:44 PM jakirkham notifications@github.com wrote:

Is there a license for that dataset @rxist525 https://github.com/rxist525?

good question, let me check and get back.

Just got off a call with Gokul earlier, he mentioned they've now added the CC BY-SA 4.0 license with the data https://drive.google.com/drive/folders/1z1nB_DRgXYWwuUBEHYvj5hVotnAlR3W4. Though are potentially open to changing it if it causes issues. Feel free to correct me Gokul if needed.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-728375126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWIJZAENTXEY4RGDF2DSQGTOXANCNFSM4G3YDJSA .

GenevieveBuckley commented 3 years ago

Juan says on the napari zulip:

Talley's lattice dataset at bioimage archive has accession S-BSST435 See this page for how to access data, neither Talley nor I have actually tried to get it out yet :joy: https://www.ebi.ac.uk/biostudies/help

Link: https://www.ebi.ac.uk/biostudies/studies/S-BSST435

(Note: Volker tried to download the sample file, but couldn't unzip it properly. He thinks it was uploaded as a zip, which has also been zipped again by bioimage archive. He says if other people can access it to let him know. He has permission to use another lattice volume belonging to users he works with, but that's only a single volume.)

jakirkham commented 3 years ago

@rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists πŸ™‚

rxist525 commented 3 years ago

here is a pdf with a good overview of our workflow. Almost all of our data goes through pre-processing. Post-processing and analysis routines are biology/dataset dependent.

On Tue, Nov 24, 2020 at 12:37 PM jakirkham notifications@github.com wrote:

@rxist525 https://github.com/rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists πŸ™‚

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-733220076, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWN2ZUFMTTWPGYHUDYLSRQKRTANCNFSM4G3YDJSA .

rxist525 commented 3 years ago

Here's the pdf link https://drive.google.com/file/d/1q79pFcA_oSexcLPxUZMM2rpm_TeOFNKJ/view?usp=sharing : https://drive.google.com/file/d/1q79pFcA_oSexcLPxUZMM2rpm_TeOFNKJ/view?usp=sharing

On Fri, Nov 27, 2020 at 3:08 PM Gokul Upadhyayula rxist525@gmail.com wrote:

here is a pdf with a good overview of our workflow. Almost all of our data goes through pre-processing. Post-processing and analysis routines are biology/dataset dependent.

On Tue, Nov 24, 2020 at 12:37 PM jakirkham notifications@github.com wrote:

@rxist525 https://github.com/rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists πŸ™‚

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-733220076, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWN2ZUFMTTWPGYHUDYLSRQKRTANCNFSM4G3YDJSA .

jakirkham commented 3 years ago

Thanks Gokul!

cc @grlee77 (in case this is of interest πŸ˜‰)

grlee77 commented 3 years ago

Thanks Gokul! cc @grlee77 (in case this is of interest wink)

Yes, thank you @gokul. For context, I have been working on CUDA-based implementations of classical (i.e. not deep learning) image processing operations and algorithms as found in scipy.ndimage and scikit-image and it is helpful to have feedback on which things to prioritize. My background is in volumetric medical imaging (MRI) rather than microscopy, so it is useful to know what types of operations are being used in the microscopy field. I have a good idea of what deskew, rotation and deconvolution involve, but if you have specific references or methods regarding which kind of segmentation algorithms, etc. are typically used, that could also be of use.

Also, is image denoising often used during pre-processing steps or is the data you typically work with already of adequate SNR?

rxist525 commented 3 years ago

Great to connect with you Gregory. Typically, for quantitative work, we strive to generate data with sufficient SNR such that existing algorithms/workflows are compatible. While detection algorithms are more sensitive than our eye at the edge cases with low SNR, to convey our findings in movies, we typically denoise the data. We also use denoising as a pre-processing step to aid in segmentation. Hope this helps, if not, happy to discuss further.

On Mon, Nov 30, 2020 at 12:13 PM Gregory R. Lee notifications@github.com wrote:

Thanks Gokul! cc @grlee77 https://github.com/grlee77 (in case this is of interest wink)

Yes, thank you @gokul https://github.com/gokul. For context, I have been working on CUDA-based implementations of classical (i.e. not deep learning) image processing operations and algorithms as found in scipy.ndimage and scikit-image and it is helpful to have feedback on which things to prioritize. My background is in volumetric medical imaging (MRI) rather than microscopy, so it is useful to know what types of operations are being used in the microscopy field. I have a good idea of what deskew, rotation and deconvolution involve, but if you have specific references or methods regarding which kind of segmentation algorithms, etc. are typically used, that could also be of use.

Also, is image denoising often used during pre-processing steps or is the data you typically work with already of adequate SNR?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-736016475, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWI52TO5F3LH7XKQ2FDSSP4GVANCNFSM4G3YDJSA .