datacarpentry / image-processing

Image Processing with Python
https://datacarpentry.org/image-processing
Other
95 stars 120 forks source link

use imageio for input/output tasks? #239

Closed tobyhodges closed 1 year ago

tobyhodges commented 1 year ago

Opening this issue so that there is an obvious place to find this discussion in future.

In #238, @mkcor wrote:

Getting to the gist of this PR, I'm following up on https://github.com/scikit-image/scikit-image/issues/5036#issuecomment-1216654958: The vision for scikit-image is to focus on actual image processing, delegating other functionalities (e.g., input/output, drawing, ...) to dedicated libraries. For reading and writing image data, this dedicated library is now available and stable, it is imageio.

Therefore, I'm thinking that, before releasing this lesson as 'stable,' we may want to consider making this change. If you agree with it, I'll propagate it to all subsequent episodes (episode 03 representing the biggest chunk). From an educational perspective, I can see how we may prefer sticking with only one library instead of two. I would argue that the proposed change doesn't add a dependency (you can readily import imageio once you have installed scikit-image); so the setup wouldn't be affected.

I could add a sentence to introduce the imageio library, if you deem it appropriate; the existing sentence seems almost sufficient and works well with the proposed change: "Let's load our image data from disk using the imread function from the imageio.v3 library and display it using ..."

Marianne tagged the other scikit-image maintainers, @stefanv @jni @alexdesiqueira @grlee77 @lagru, in that pull request so I am doing the same here.

Here is a list of pros and cons I can identify for making the change:

➕ reflects preferences of the scikit-image project ➕ reflects best practice for the scikit-image user community ➕ seems to require only very minor changes to function & argument names, no syntax changes ➕ those new names are no less intuitive than the skimage.io equivalents ➕ new dependency requires no additional setup steps

➖ introduces a new library that needs to be explained in the lesson ➖ the need to specifically import v3 implies the existence of (at least) a v2 and v1, which may need to be explained - why is it necessary to import imageio.v3 as iio and not "just" imageio as iio as I suspect many new users would try to do? ➖ learners have to develop a mental model of when it is appropriate to use imageio vs skimage

Overall, I am in favour of making this change. However, in addition to the adjustments @mkcor has already proposed in the #238, I think that we need more context for the new library. A few sentences explaining:

I'll leave this (and #238) open for a few days, to give @K-Meech @bobturneruk @quist00 and @uschille a chance to join the discussion.

mkcor commented 1 year ago

why you need to import v3 specifically

Admittedly I wasn't aware of Image IO v3 until I submitted #238. I had always done import imageio as io until very recently (for example, in my tutorial at Data Umbrella on October 25th [1]). And, since I would call function volread to read 'volumetric' data, I hadn't come across this deprecation warning for imread:

DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.

Also, it turns out I was on maternity leave when the imageIO warnings (due to v2 -> v3 migration) were reported [2]!

My understanding from the migration guide [3] is that the current better practice is to import imageio.v3 as iio but I find it unfortunate from a teaching viewpoint, indeed. Maybe @almarklein or @FirefoxMetzger could recommend a workaround?

[1] https://www.youtube.com/watch?v=NqdhuU1fX5A [2] https://github.com/scikit-image/scikit-image/issues/6341 [3] https://imageio.readthedocs.io/en/stable/reference/userapi.html#migration-from-v2

FirefoxMetzger commented 1 year ago

My understanding from the migration guide [3] is that the current better practice is to import imageio.v3 as iio but I find it unfortunate from a teaching viewpoint, indeed. Maybe @almarklein or @FirefoxMetzger could recommend a workaround?

@mkcor I must admit that I am lacking a bit of context here. Could you elaborate on how/what is unfortunate about this import?

mkcor commented 1 year ago

Hi @FirefoxMetzger!

Thanks for your response.

I mean 'unfortunate' in the sense that, in addition to introducing imageio (which, let's agree on this for now, is fine), the teacher will have to explain that imageio has a v2 API and a v3 API, and we happen to use the v3 API, and why so... Because it's the latest, I guess it would be a sufficient reason: Can we afford to explain that it supports nD natively?

It's all super interesting and, personally, as a teacher, I tend to make digressions... But, generally speaking, we want to avoid any unnecessary cognitive load in a lesson (which is going to be officially part of The Carpentries curriculum).

mkcor commented 1 year ago

PS: @FirefoxMetzger I just thought of the following: Can we expect the deprecation warning to be gone... soon?

@tobyhodges if, by miracle, that timing could coincide with the 'release' of the lesson ('stable' status), we could simply use import imageio as iio and iio.imread(...), hence getting rid of one of the downsides for the proposed change.

stefanv commented 1 year ago

I don't see the point in deprecating skimage.io. It makes for a very simple interface that users are already familiar with, and has such a low maintenance burden. We can get rid of the plugin system, if that feels necessary.

FirefoxMetzger commented 1 year ago

@mkcor Right. I would like to explain this here from two angles: first from the dev perspective (interesting for you/us so we have the necessary context) and then from the user perspective (interesting for students learning about image processing).

From the dev perspective, the vX formalism kills two birds with one stone.

Firstly, it gives us more wiggle room when making backward-incompatible improvements. We could easily add a imageio.v4 with some experimental changes and release it without breaking any production code that uses imageio.v3. That is very useful because we can get user feedback early since there are people using the library in ways that we/I would have never imagined.

Secondly, it gives a clear migration path for when you do remove things after a deprecation cycle. People can immediately use ìmageio>=3.X and postpone migration until they have time (via import imageio.v2 as imageio). At the same time, they can continue to update ImageIO to get security updates, bug fixes, or new backend features. This beats the alternative approach of starting a new repo/library for a version-major change where you then have to import imageio3 (which looks suspiciously like import imageio.v3 as iio except that the latter is more explicit). The new repo approach always risks losing/splitting your community, because they stay with the old repo and don't migrate with you.

Now, from the user's side, the main benefit is that you are forced to explicitly version your dependency. If you are a scientist that writes a script to analyze some data, chances are you are not going to track all the versions of all the dependencies you are using. You should, but that doesn't mean it happens in practice - unfortunately. If another person (or future you) wants to use your script 5 years down the line and sees import imageio.v3 then that will correspond to installing imageio<4.0.

The other benefits for the user mirror the benefits on the dev side. You can try out the future version of the API early (import imageio.v4 as iio) and enjoy some of the new improvements even if some of your codebase still contains legacy code.

Can we afford to explain that it supports nD natively?

The direct answer is yes, one of the exciting things in v3 is that you can more easily deal with ndimages, which allows you to imread/imwrite all the things. You also get a few other improvements like improps which gives you standardized metadata.

I'm still not sure I get the underlying question though. You've also asked for a workaround in the previous post, so I'm guessing that you are looking for a short way to explain why we use import imageio.v3 as iio instead of import imageio as iio?

FirefoxMetzger commented 1 year ago

Right, I have just read through the entire issue and I think I get it now. Does the following help?

PS: @FirefoxMetzger I just thought of the following: Can we expect the deprecation warning to be gone... soon?

My goal was to release it this month, but then I got delayed due to an ongoing job hunt (I finally graduated from my PhD 🚀). I might be able to release it towards the end of the month, but I am reluctant to do so, given the proximity to the Christmas holidays. People rush out final features before the end of the year and I don't want to make that period more complicated by applying deprecations. I guess a good new estimate is mid-January.

introduces a new library that needs to be explained in the lesson

The way we have traditionally explained this in class is that ImageIO allows you to read images using iio.imread and to write images using iio.imwrite. That is of course high-level and you could add much more details, but it is a very good start that solves 80% of what people typically do with the library.

the need to specifically import v3 implies the existence of (at least) a v2 and v1, which may need to be explained - why is it necessary to import imageio.v3 as iio and not "just" imageio as iio as I suspect many new users would try to do?

The same reason why you do import matplotlib.pyplot as plt. You select the high-level API you want to use to interact with ImageIO. (And also document which version-major you use, so that other people can replicate your work in the future. Even if you didn't keep a requirements.txt 👼)

learners have to develop a mental model of when it is appropriate to use imageio vs skimage

Yes, I would have classified it as @tobyhodges suggested. ImageIO is for reading/writing (specifically for encoding and decoding image and video data from various sources). Skimage is for processing (specifically transforming image data into something useful).

mkcor commented 1 year ago

I don't see the point in deprecating skimage.io. It makes for a very simple interface that users are already familiar with, and has such a low maintenance burden. We can get rid of the plugin system, if that feels necessary.

@stefanv thanks for chiming in. I think that, whether we deprecate skimage.io or not, introducing imageio in this lesson has educational value. As @FirefoxMetzger puts it:

ImageIO is for reading/writing (specifically for encoding and decoding image and video data from various sources). Skimage is for processing (specifically transforming image data into something useful).

stefanv commented 1 year ago

I suppose the question I ask when teaching is: what is the added benefit to the learner. For beginners, anything more than io.imread is noise. For more sophisticated users, having additional capabilities become useful.

Along those lines, it would make sense to make it clear in the skimage docs that imageio is used under the hood, and is recommended for more sophisticated data handling (I think we have some of that language in place, but could expand it further).

mkcor commented 1 year ago

@FirefoxMetzger thanks for your thorough replies.

From the dev perspective, the vX formalism kills two birds with one stone.

I'm absolutely sold! In particular, I'm a big fan of the following:

Secondly, it gives a clear migration path for when you do remove things after a deprecation cycle.

[snip]

Now, from the user's side, the main benefit is that you are forced to explicitly version your dependency.

I'm sold as well. The only issues being raised here are from... the teacher's side (to oversimply), in a specific setting: I've never meant to question the development choices or pathways made at imageio!

Thanks for the write-up anyway: This way I could catch up on this migration episode I had missed; plus, your project is an inspiration... (thinking of https://discuss.scientific-python.org/t/a-pragmatic-pathway-towards-skimage2/530) :eyes:

The same reason why you do import matplotlib.pyplot as plt.

Good point!

mkcor commented 1 year ago

I suppose the question I ask when teaching is: what is the added benefit to the learner. For beginners, anything more than io.imread is noise.

Yes, that's what I meant with "we want to avoid any unnecessary cognitive load in a lesson." And here learners are beginners, I'm afraid (they are not absolute beginners in Python though: https://datacarpentry.org/image-processing/prereqs/index.html).

@FirefoxMetzger that's why I asked "Can we afford to explain that it supports nD natively?" because it adds yet another conceptual box for the learner to process.

For more sophisticated users, having additional capabilities become useful.

@tobyhodges do we consider 'more sophisticated users' among the learners?

mkcor commented 1 year ago

My goal was to release it this month, but then I got delayed due to an ongoing job hunt (I finally graduated from my PhD :rocket:).

@FirefoxMetzger congratulations!! :clap:

I might be able to release it towards the end of the month, but I am reluctant to do so, given the proximity to the Christmas holidays. People rush out final features before the end of the year and I don't want to make that period more complicated by applying deprecations. I guess a good new estimate is mid-January.

Good to know. Thank you so much!

FirefoxMetzger commented 1 year ago

For beginners, anything more than io.imread is noise.

So we are essentially debating

from skimage import io

img = io.imread(...)

versus

import imageio.v3 as iio

img = iio.imread(...)

This is my personal opinion only, but I think the difference is rather small; perhaps either option can work just fine?

On a tangent, we have been teaching ImageIO for the past 2 years now, and so far I didn't get any questions about why we import imageio.v3 and not import imageio. My guess is that most of our students just accepted that this is how you import the library without questioning it.

stefanv commented 1 year ago

@FirefoxMetzger Congratulations on your PhD, that's huge!! 🎉

W.r.t. the code above, I'd teach them something that will remain valid. I.e., don't develop muscle memory around snippets that will change. If imageio.v3 will hang around for years, that makes sense.

FirefoxMetzger commented 1 year ago

If imageio.v3 will hang around for years, that makes sense.

That's the plan. It will be around until ImageIO v5 comes out, and considering that we have historically done one version major every 5 years...

tobyhodges commented 1 year ago

Alright. Many thanks everyone for your thoughtful contributions - it is really helpful context for me as a maintainer of the lesson, and will also be useful for Instructors teaching the curriculum, giving them something to refer to if a learner does ask about why we import v3 specifically from imageio.

@mkcor If you are happy to update #238 please add a sentence explaining that imageio.v3 is specifying that we want to use version 3 of the imageio API, and another sentence describing the recommended division of tasks between imageio and skimage.

uschille commented 1 year ago

I'm with @tobyhodges - thank you all for this discussion. Imho the pros outweigh the cons for this change. While it introduces some cognitive load, I think it is also important to get learners accustomed to best practices. Since the change reflects the preference of the scikit-image project and community, it makes sense to adopt this in the lesson. It will also avoid a potential gap when learners "step out into the wild" and see something different - after all we would want to prepare learners to become comfortable applying the content in practice.

mkcor commented 1 year ago

@mkcor If you are happy to update #238 please add a sentence explaining that imageio.v3 is specifying that we want to use version 3 of the imageio API, and another sentence describing the recommended division of tasks between imageio and skimage.

For sure, I'll take care of it shortly. Thanks for your feedback, @tobyhodges and @uschille.

Should we update the code snippet featured in the setup instructions accordingly, as in 6d100ac804720c81fd49a35fc96131cc986378ee? If not, I'm happy to revert this commit.

Also, should I go ahead and update all subsequent episodes accordingly... within the same PR?

tobyhodges commented 1 year ago

The update to the setup instructions in 6d100ac is perfect, thanks @mkcor. You even caught an Americanization that I'd missed 🙌

Given that you have already started making these updates in #238 I think you may as well take care of the other episodes there too. I'll respond there with more detailed comments.

mkcor commented 1 year ago

Hi @tobyhodges,

please add a sentence explaining that imageio.v3 is specifying that we want to use version 3 of the imageio API,

Done 726676ff0a.

and another sentence describing the recommended division of tasks between imageio and skimage

Done 1f330e255e5a in the form of "Why not use skimage.io.imread()" -- I have drawn inspiration from episode 03 ("Why not use skimage.io.imshow()"). I have also added this recommendation as an objective e1ff0432fe676d: Please double-check that this is fine.

I have updated the section about metadata aa584e87b14, since imageio.v3 does handle metadata (and nicely so! :tada:), which is yet another reason for using imageio when it comes to reading/writing images.

I have kept the warning because I noticed that imageio.v3 does not preserve all metadata. For example:

image = iio.imread("data/colonies-01.tif")
iio.immeta("data/colonies-01.tif")

{'is_fluoview': False,
 'is_nih': False,
 'is_micromanager': False,
 'is_ome': False,
 'is_lsm': False,
 'is_reduced': False,
 'is_shaped': False,
 'is_stk': False,
 'is_tiled': False,
 'is_mdgel': False,
 'compression': 1,
 'predictor': 1,
 'is_mediacy': False,
 'description': 'ImageJ=1.48v',
 'description1': '',
 'is_imagej': True,
 'software': ''}

iio.imwrite(image=image, uri="data/test_metadata_colonies-01.tif")
iio.immeta(uri="data/test_metadata_colonies-01.tif")

{'is_fluoview': False,
 'is_nih': False,
 'is_micromanager': False,
 'is_ome': False,
 'is_lsm': False,
 'is_reduced': False,
 'is_shaped': True,
 'is_stk': False,
 'is_tiled': False,
 'is_mdgel': False,
 'compression': <COMPRESSION.NONE: 1>,
 'predictor': 1,
 'is_mediacy': False,
 'description': '{"shape": [303, 306, 3]}',
 'description1': '',
 'is_imagej': False,
 'software': 'tifffile.py',
 'resolution_unit': 1,
 'resolution': (1.0, 1.0, 'NONE')}

@FirefoxMetzger we can see that the (format-specific) 'description' field got overwritten with a standardized description...

FirefoxMetzger commented 1 year ago

@mkcor It actually isn't a standardized description, but rather a description used by a different TIFF flavor (called shaped). Your input image is an ImageJ TIFF and your output is a shaped TIFF (tifffile's default flavor) and thus the metadata differs. If you check the metadata then is_imagej == True for the input image and is_shaped == True for the output.

The problem with writing ImageJ TIFF is that - afaik - ImageJ never published a spec on how its particular flavor of TIFF works and how it should be written. Therefore it is actually really hard for other programs to create readers and writers because most of this is based on reverse-engineering source code and inspecting example images to see how it's done.

Fortunately, cgohlke has done a heroic job of achieving the above for python and tifffile, so we can read most ImageJ TIFF and write some ImageJ TIFF (in particular hyperstacks). Unfortunately for ImageIO, the ability to write ImageJ TIFF is newer than ImageIO's tifffile plugin, so while tifffile can, ImageIO (currently) can not. A PR is welcome though ⭐! (If you have a motivated student who would like to write a modern version of the tifffile plugin, let me know :) I'm happy to provide guidance and it would do a lot of people a lot of good.)

mkcor commented 1 year ago

@FirefoxMetzger thanks for correcting me and letting me know!

(If you have a motivated student who would like to write a modern version of the tifffile plugin, let me know :) I'm happy to provide guidance and it would do a lot of people a lot of good.)

I will spread the word! :pray:

Oops! I've kept all the as_gray=True the same way they used to be passed to skimage.io.imread()... but this gives a TypeError: unexpected keyword argument: as_gray.

I had read https://imageio.readthedocs.io/en/v2.6.1/scipy.html too quickly (that was imageio.v2!!), now I realize that this keyword argument is legacy: https://github.com/imageio/imageio/blob/02114a09d9c87b5d03a91b0a250da567ec48fbff/tests/test_pillow_legacy.py#L471

I haven't found the info in the migration guide; is it documented somewhere? Thank you!

FirefoxMetzger commented 1 year ago

@mkcor Yes, ImageIO v2.6 is old indeed (about 3 years now). Your intuition is correct and you can pass kwargs to iio.imread in the same way as it works with skimage.io.imread, and they will indeed be forwarded to the backend as you'd expect.

What's new is that we are doing a better job now at using the same naming convention as the underlying backend. as_gray was an alias for pillow's mode="L" and setting it directly does what you'd expect:

import imageio.v3 as iio

img = iio.imread("imageio:chelsea.png", mode="L")

test

In the same way, you can get other colorspaces supported by pillow such as "HSV" or "CMYK". A very useful one to know here is "RGB", which will normalize the pixel format in case you are reading different sources such as RGBA, gray, RGB, ..., but wish to have them all in channel-last 8-bit RGB format.

A caveat here is that manipulating color-channels is still backend/plugin specific. For example, if you were working with video (via pyav) you would have to use format="gray8" to convert to grayscale as in iio.imread("imageio:cockatoo.mp4", format="gray8", plugin="pyav").

This is sub-optimal, so I am hoping to introduce something like colorspace="gray" as a universal kwarg in the future. This is, however, a bigger project because I need to figure out (1) what set of colorspaces to support and how to name them, (2) how to support them across all backends and how to deal with those that don't provide native conversion capabilities.

I haven't found the info in the migration guide; is it documented somewhere? Thank you!

That's a good question, and I don't think it is documented. I will create an issue over at ImageIO and get back to it once I come back from my vacation 🏖️

mkcor commented 1 year ago

@FirefoxMetzger Thank you so much!

PS about the metadata thing: Do you think that what I wrote in aa584e87b14df33dcc193854493176a3d4f0bd24 is fine though?

uschille commented 1 year ago

Closing this issue which has been addressed in #238.