datacarpentry / image-processing

Image Processing with Python
https://datacarpentry.org/image-processing
Other
104 stars 125 forks source link

Using image as a variable name #258

Closed drcandacemakedamoore closed 2 years ago

drcandacemakedamoore commented 2 years ago

'image' is used as a variable name in several places in the code of the lesson.

The first times this happen is right after here: https://datacarpentry.org/image-processing/02-image-basics/index.html#why-not-use-skimageioimread

but it happens during several of the lessons, not just this one. This is a bad idea for several reasons including 1. in many image processing libraries, there is a module or submodule named image so it is never a good idea to use this as a variable name 2. it is too generic, and therefore confusing in and of itself

I will try to open a PR with some examples of this changed soon.

mkcor commented 2 years ago

Dear @drcandacemakedamoore,

Thank you for sharing your feedback. I'm interested in your perspective, because we also use 'trivial' variable names (image, images, image_sequence) extensively in the scikit-image gallery of examples.

I'm curious what your suggestions will be!

but it happens during several of the lessons, not just this one.

Can you please specify which other lessons?

Thanks again, Marianne

drcandacemakedamoore commented 2 years ago

@mkcor I made a PR where I changed this in episodes 2 and 3. It happens in several other episodes as well, however I or someone else will need to create a PR to resolve this. I hope I will send such a PR soon if no one else does.

uschille commented 2 years ago

Thank you @drcandacemakedamoore for bringing this issue up! It is a very valid point and the maintainer team would like to weigh the pros and cons for 'trivial' variable names. The following list is what came up so far and is an incomplete list:

Cons:

Pros:

We would like to invite other community members to contribute to the discussion.

Thanks again for starting this discussion!

mkcor commented 2 years ago

Thank you, @drcandacemakedamoore!

It happens in several other episodes as well

Ok, so you meant other episodes. :+1:

I can see the value of using more descriptive variable names, but generic variable names are convenient: Thanks for summing up the pros and cons, @uschille!

generic variable names are in line with examples on popular websites, e.g., scikit-image gallery

Oops, I've started some circular reasoning here: https://github.com/scikit-image/scikit-image/pull/6853/files#r1149112707

I tend to view this choice as follows:

bobturneruk commented 2 years ago

Thanks for all the input on this. It's clearly not an obvious call. If we were to go down this route of having variable names specific to the subject of each image, we should make sure that the file type from which the image was read is not part of the name i.e. eight or image_eight would be preferable to eight_tiff. This is because the representation in-memory is (as far as I know) no longer anything to do with the the file on disk.

The other thing is we're not entirely consistent in our current version with how we do this e.g. these two lines:

image = iio.imread(uri="data/eight.tif")
zero = iio.imread(uri="data/eight.tif")

There may be others. We should take this opportunity to try and be more consistent.

I think in practise, if one is writing a notebook that reads several specific images, then specific names are good practise, often the variable name is the only way to carry critical metadata through the analysis, but if one is writing code for a Python package containing image analysis functions / code that iterates over multiple images then image is often going to be an appropriately specific name. I don't feel this distinction is within the scope of this course, though. Overall, my opinion is that it's very slightly better, here to go with the scikit-image convention, and perhaps explain our reasoning in the instructor notes?

uschille commented 2 years ago

A generic name makes sense in an educational setting, where concepts and methods are taught in order to be transferred and applied elsewhere (the application is a 'pretext' for teaching/learning);

I tend to agree with @mkcor. My feeling is that learners would understand intuitively that the variable image can contain different images. Working through the exercises will be easier if they can re-use code snippets between different examples without having to think about variable (re-)naming. That may be less cognitive load than having to translate the variable names for each example.

I also agree with @bobturneruk that consistency matters, so if we change the naming convention in the first few episodes, we should check the remaining episodes for consistency.

tobyhodges commented 2 years ago

Working through the exercises will be easier if they can re-use code snippets between different examples without having to think about variable (re-)naming. That may be less cognitive load than having to translate the variable names for each example.

I mostly agree. However, given that we have designed the curriculum with the intention that it is taught in Jupyter, I think one risk is that (due to the not-necessarily-linear execution order of code cells in a Notebook) learners might re-run an earlier code cell and unintentionally operate on another variable called image that has been created further down the document.

If we stick with image as the generic name for a variable containing image data loaded from a file, I think we should do two things:

  1. add an Instructor note, recommending that Instructors guide learners to create a new Jupyter Notebook for each episode.
  2. enforce an "only one variable named image per episode" policy: if any episodes contain multiple "different" image variables (i.e. where multiple files are loaded at different points in the episode), those variables should be named differently.