Closed drcandacemakedamoore closed 2 years ago
Dear @drcandacemakedamoore,
Thank you for sharing your feedback. I'm interested in your perspective, because we also use 'trivial' variable names (image
, images
, image_sequence
) extensively in the scikit-image gallery of examples.
I'm curious what your suggestions will be!
but it happens during several of the lessons, not just this one.
Can you please specify which other lessons?
Thanks again, Marianne
@mkcor I made a PR where I changed this in episodes 2 and 3. It happens in several other episodes as well, however I or someone else will need to create a PR to resolve this. I hope I will send such a PR soon if no one else does.
Thank you @drcandacemakedamoore for bringing this issue up! It is a very valid point and the maintainer team would like to weigh the pros and cons for 'trivial' variable names. The following list is what came up so far and is an incomplete list:
Cons:
image
in image processing librariesPros:
We would like to invite other community members to contribute to the discussion.
Thanks again for starting this discussion!
Thank you, @drcandacemakedamoore!
It happens in several other episodes as well
Ok, so you meant other episodes. :+1:
I can see the value of using more descriptive variable names, but generic variable names are convenient: Thanks for summing up the pros and cons, @uschille!
generic variable names are in line with examples on popular websites, e.g., scikit-image gallery
Oops, I've started some circular reasoning here: https://github.com/scikit-image/scikit-image/pull/6853/files#r1149112707
I tend to view this choice as follows:
image
, img
, or image_sequence
are typically used when loading the 'raw' image, and subsequent processing steps are reflected in the variables names, which helps follow the workflow (e.g., https://scikit-image.org/docs/stable/auto_examples/applications/plot_solidification_tracking.html);Thanks for all the input on this. It's clearly not an obvious call. If we were to go down this route of having variable names specific to the subject of each image, we should make sure that the file type from which the image was read is not part of the name i.e. eight
or image_eight
would be preferable to eight_tiff
. This is because the representation in-memory is (as far as I know) no longer anything to do with the the file on disk.
The other thing is we're not entirely consistent in our current version with how we do this e.g. these two lines:
image = iio.imread(uri="data/eight.tif")
zero = iio.imread(uri="data/eight.tif")
There may be others. We should take this opportunity to try and be more consistent.
I think in practise, if one is writing a notebook that reads several specific images, then specific names are good practise, often the variable name is the only way to carry critical metadata through the analysis, but if one is writing code for a Python package containing image analysis functions / code that iterates over multiple images then image
is often going to be an appropriately specific name. I don't feel this distinction is within the scope of this course, though. Overall, my opinion is that it's very slightly better, here to go with the scikit-image
convention, and perhaps explain our reasoning in the instructor notes?
A generic name makes sense in an educational setting, where concepts and methods are taught in order to be transferred and applied elsewhere (the application is a 'pretext' for teaching/learning);
I tend to agree with @mkcor. My feeling is that learners would understand intuitively that the variable image
can contain different images. Working through the exercises will be easier if they can re-use code snippets between different examples without having to think about variable (re-)naming. That may be less cognitive load than having to translate the variable names for each example.
I also agree with @bobturneruk that consistency matters, so if we change the naming convention in the first few episodes, we should check the remaining episodes for consistency.
Working through the exercises will be easier if they can re-use code snippets between different examples without having to think about variable (re-)naming. That may be less cognitive load than having to translate the variable names for each example.
I mostly agree. However, given that we have designed the curriculum with the intention that it is taught in Jupyter, I think one risk is that (due to the not-necessarily-linear execution order of code cells in a Notebook) learners might re-run an earlier code cell and unintentionally operate on another variable called image
that has been created further down the document.
If we stick with image
as the generic name for a variable containing image data loaded from a file, I think we should do two things:
image
per episode" policy: if any episodes contain multiple "different" image
variables (i.e. where multiple files are loaded at different points in the episode), those variables should be named differently.
'image' is used as a variable name in several places in the code of the lesson.
The first times this happen is right after here: https://datacarpentry.org/image-processing/02-image-basics/index.html#why-not-use-skimageioimread
but it happens during several of the lessons, not just this one. This is a bad idea for several reasons including 1. in many image processing libraries, there is a module or submodule named image so it is never a good idea to use this as a variable name 2. it is too generic, and therefore confusing in and of itself
I will try to open a PR with some examples of this changed soon.