Open GenevieveBuckley opened 5 years ago
The Cancer Genome Atlas Database might work: https://portal.gdc.cancer.gov/
There are some histology images there that could fit the requirements. They do have file formats that would probably need a third party library to read into python, but you can download individual images separately pretty easily.
The xarray examples (like this one, or this one) sometimes uses NetCDF climate data from the Climate Data Store. Website: https://cds.climate.copernicus.eu
Nick took a histology CC0 image and converted it to zarr - https://camelyon16.grand-challenge.org/Data/
Edit: updated link - https://camelyon17.grand-challenge.org/Data/
The landsat data might also be good https://landsat.gsfc.nasa.gov/data/
Some people have said they think landsat is CC0 licensed, but I haven't found that page on the website yet so we better double check.
Here's a wrapper around the API to make it easier: https://github.com/loicdtx/lsru
And from the napari discussions https://github.com/napari/napari/issues/408#issuecomment-511214119
More info on Landsat 8 can be found at: https://landsat.gsfc.nasa.gov/landsat-8/mission-details/ I used the https://github.com/loicdtx/lsru to order the imagery, folks can also download Landsat 8 with https://earthexplorer.usgs.gov/
cc @scottyhq who knows a bunch about landsat
(although, Scott is also a big xarray user, and we might want to avoid Xarray for this example in order to keep things focused on Dask Image)
Thanks @mrocklin. Yes, we've used landsat8 for some examples since it is a public dataset on AWS and Google Cloud. Here is a blog post with some background: https://medium.com/pangeo/cloud-native-geoprocessing-of-earth-observation-satellite-data-with-pangeo-997692d91ca2, or if you just want to take a look at a notebook: https://github.com/scottyhq/esip-tech-dive/blob/master/notebooks/0-demo-aws.ipynb. As mentioned, these examples demonstrate using xarray integrated with dask.
Thank you @scottyhq I'll take a look at those links and see if we can't get something up and running
@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data . They might or might not be good candidates, but I ALSO notice that there is no licence with that data. Is that an oversight?
More code from @timothywallaby, looping through a large image and appending to zarr: https://github.com/timothywallaby/dask/blob/master/OpenSlidetoZarr.ipynb
@sofroniewn do you have the code you used to convert the Camelyon data to zarr?
@sofroniewn do you have the code you used to convert the Camelyon data to zarr?
The code from @sofroniewn is here: https://github.com/sofroniewn/image-demos/blob/master/helpers/make_2D_zarr_pathology.py
The instructions were not to use it as is until we can work out why the saved file is bigger than the original tiff. Personally I also feel that for this purpose we don't really need the multilevel hierarchy, so that might make things a bit simpler.
@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data .
@rxist525, what do you think? Would it be ok to use that data for code examples here?
@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data .
@rxist525, what do you think? Would it be ok to use that data for code examples here?
Absolutely!!
@jakirkham someone just asked me if we could use some of the images you link to from your blog post on loading image data . They might or might not be good candidates, but I ALSO notice that there is no licence with that data. Is that an oversight?
you are welcome to use the data, which is part of a recent publication.
Is there a license for that dataset @rxist525?
Is there a license for that dataset @rxist525?
good question, let me check and get back.
Also a potentially useful discussion: https://github.com/thewtex/fiber-bed-zarr/issues/1#issuecomment-595984988
Is there a license for that dataset @rxist525?
good question, let me check and get back.
Just got off a call with Gokul earlier, he mentioned they've now added the CC BY-SA 4.0 license with the data. Though are potentially open to changing it if it causes issues. Feel free to correct me Gokul if needed.
Apologies for dropping the ball on this - thanks for adding the note!
On Mon, Nov 16, 2020 at 2:44 PM jakirkham notifications@github.com wrote:
Is there a license for that dataset @rxist525 https://github.com/rxist525?
good question, let me check and get back.
Just got off a call with Gokul earlier, he mentioned they've now added the CC BY-SA 4.0 license with the data https://drive.google.com/drive/folders/1z1nB_DRgXYWwuUBEHYvj5hVotnAlR3W4. Though are potentially open to changing it if it causes issues. Feel free to correct me Gokul if needed.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-728375126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWIJZAENTXEY4RGDF2DSQGTOXANCNFSM4G3YDJSA .
Juan says on the napari zulip:
Talley's lattice dataset at bioimage archive has accession S-BSST435 See this page for how to access data, neither Talley nor I have actually tried to get it out yet :joy: https://www.ebi.ac.uk/biostudies/help
Link: https://www.ebi.ac.uk/biostudies/studies/S-BSST435
(Note: Volker tried to download the sample file, but couldn't unzip it properly. He thinks it was uploaded as a zip, which has also been zipped again by bioimage archive. He says if other people can access it to let him know. He has permission to use another lattice volume belonging to users he works with, but that's only a single volume.)
@rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists π
here is a pdf with a good overview of our workflow. Almost all of our data goes through pre-processing. Post-processing and analysis routines are biology/dataset dependent.
On Tue, Nov 24, 2020 at 12:37 PM jakirkham notifications@github.com wrote:
@rxist525 https://github.com/rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists π
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-733220076, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWN2ZUFMTTWPGYHUDYLSRQKRTANCNFSM4G3YDJSA .
Here's the pdf link https://drive.google.com/file/d/1q79pFcA_oSexcLPxUZMM2rpm_TeOFNKJ/view?usp=sharing : https://drive.google.com/file/d/1q79pFcA_oSexcLPxUZMM2rpm_TeOFNKJ/view?usp=sharing
On Fri, Nov 27, 2020 at 3:08 PM Gokul Upadhyayula rxist525@gmail.com wrote:
here is a pdf with a good overview of our workflow. Almost all of our data goes through pre-processing. Post-processing and analysis routines are biology/dataset dependent.
On Tue, Nov 24, 2020 at 12:37 PM jakirkham notifications@github.com wrote:
@rxist525 https://github.com/rxist525, do you have a workflow that you typically use on your data? If so, would you be able to share that as well? A birds eye view would be fine. Though a notebook would also be good if it exists π
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-733220076, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWN2ZUFMTTWPGYHUDYLSRQKRTANCNFSM4G3YDJSA .
Thanks Gokul!
cc @grlee77 (in case this is of interest π)
Thanks Gokul! cc @grlee77 (in case this is of interest wink)
Yes, thank you @gokul. For context, I have been working on CUDA-based implementations of classical (i.e. not deep learning) image processing operations and algorithms as found in scipy.ndimage
and scikit-image
and it is helpful to have feedback on which things to prioritize. My background is in volumetric medical imaging (MRI) rather than microscopy, so it is useful to know what types of operations are being used in the microscopy field. I have a good idea of what deskew
, rotation
and deconvolution
involve, but if you have specific references or methods regarding which kind of segmentation
algorithms, etc. are typically used, that could also be of use.
Also, is image denoising often used during pre-processing steps or is the data you typically work with already of adequate SNR?
Great to connect with you Gregory. Typically, for quantitative work, we strive to generate data with sufficient SNR such that existing algorithms/workflows are compatible. While detection algorithms are more sensitive than our eye at the edge cases with low SNR, to convey our findings in movies, we typically denoise the data. We also use denoising as a pre-processing step to aid in segmentation. Hope this helps, if not, happy to discuss further.
On Mon, Nov 30, 2020 at 12:13 PM Gregory R. Lee notifications@github.com wrote:
Thanks Gokul! cc @grlee77 https://github.com/grlee77 (in case this is of interest wink)
Yes, thank you @gokul https://github.com/gokul. For context, I have been working on CUDA-based implementations of classical (i.e. not deep learning) image processing operations and algorithms as found in scipy.ndimage and scikit-image and it is helpful to have feedback on which things to prioritize. My background is in volumetric medical imaging (MRI) rather than microscopy, so it is useful to know what types of operations are being used in the microscopy field. I have a good idea of what deskew, rotation and deconvolution involve, but if you have specific references or methods regarding which kind of segmentation algorithms, etc. are typically used, that could also be of use.
Also, is image denoising often used during pre-processing steps or is the data you typically work with already of adequate SNR?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-image/issues/107#issuecomment-736016475, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3CWI52TO5F3LH7XKQ2FDSSP4GVANCNFSM4G3YDJSA .
dask-image example datasets
We need some good example data for tutorials with dask-image.
This issue is a place for discussion and suggestions. If you have links, add them here!
Ideally this data should:
It would be nice to have
What we want to avoid:
EDIT: https://github.com/napari/napari/issues/316
https://github.com/napari/napari/issues/316#issuecomment-952642188