Closed Louis-Dupont closed 1 year ago
I think it's a good place to start with asking "Why do we ever care about channels layout in images in the first place?"
Perhaps I may miss few use cases, but this is my list of places where we want to have some information on which channel is red, green or blue:
And here are some examples of what input image can be:
Let's assume we have an abstract concept ImageChannelMapping
(Name is arbitrary) that we want to encapsulate this knowledge of what each channel represents. Here are the technical requirements for this concept from my perspective:
class ImageChannelMapping(ABC):
def get_channel_names_and_colors(self) -> List[Tuple[str, Tuple[int,int,int]]: pass
def get_mean_intensity_image(self, image:np.ndarray) -> np.ndarray: pass)
def get_rgb_representation(self, image:np.ndarray) -> np.ndarray: pass
This is the internal concept that DG can use to operate on images.
Option 1 (most explicit) - User would have to pass this explicitly via code:
Sure we can simplify his life by providing a pre-made templates: ImageChannelMapping.RGB
, ImageChannelMapping.Grayscale
, etc.
But also if some custom channel scheme is required:
mapping = ImageChannelMapping(
channel_names=["VV","VH","HH"], # This is for plots
channel_colors=[(255,0,0), (10,255,30), (0,40,255)], # This is for plots
get_rgb_representation = lambda image: cv2.normalize(image[0] + 0.5 * image[1]),
get_mean_intensity_image = lambda image: np.mean(image, axis=-1)
)
Option 2 (user input)
If image has 3 channels, prompt him is it's RGB, BGR, LAB, YCrCB or something else.
If image has 1 channel prompt if it's grayscale or something else
If image has other number of channels go to something else.
something else: Ask user to give some meaningful names to the channels (Or if he don't want name them Channel_0, Channel_1, etc..)
In case of something else
we warn that get_rgb_representation and get_mean_intensity_image will be some default implementations (Eg. taking first 3 chanels) and that's it.
Now we can see the custom channels (not RGB, BGR, G). Next step will be to add support for custom names.
Introducing a new way of handling image channels
With this change, we can now describe more complex channel representations like (R, G, B, Depth, ...)
The challenge is that there is an infinite number of combinations so we cannot have 1 class/enum for each. AND we don't want the user to import weird objects and to build on top of it.
This leads to some questions
Solution Proposed
RGB
BGR
G
LAB
O
should be used as a placeholder for other channels (Depth, ...)E.g.
RGBO
for (Red, Green, Blue, Depth)These strings are later used to instantiate a class that handles all conversion logic.
ImageChannels
These objects take the strings from above and then handle all of the logic of converting to an image of given channel representation to rgb/lab. This abstracts away the logic from the feature extractors.
See how it is used with samples
Example of questions
Something I explored but gave up on
At first, I started with a more custom solution. You can enter any channel type in any order.
OOROBOG
for instance, and then DG would understand which channels are Red, Blue, Green and reorganise. The issue is that, it adds lots of complexity inFinal Notes
I know the design is not perfect, I tried to make it as simple as possible while being general enough. I also think I should refine the way questions are being asked. Let me know any thought that comes to your mind