Deci-AI / data-gradients

Computer Vision dataset analysis
Apache License 2.0
293 stars 33 forks source link

Hotfix/sg 000 fix question display n channel #200

Closed Louis-Dupont closed 1 year ago

Louis-Dupont commented 1 year ago

Introducing a new way of handling image channels

With this change, we can now describe more complex channel representations like (R, G, B, Depth, ...)

The challenge is that there is an infinite number of combinations so we cannot have 1 class/enum for each. AND we don't want the user to import weird objects and to build on top of it.

This leads to some questions

Solution Proposed

  1. String representation

E.g. RGBO for (Red, Green, Blue, Depth)

These strings are later used to instantiate a class that handles all conversion logic.

  1. ImageChannels

These objects take the strings from above and then handle all of the logic of converting to an image of given channel representation to rgb/lab. This abstracts away the logic from the feature extractors.

See how it is used with samples

@dataclasses.dataclass
class ImageSample:
    ...
    image_channels: ImageChannels 

    @property
    def image_as_rgb(self) -> np.ndarray:
        return self.image_channels.convert_image_to_rgb(image=self.image)

    @property
    def image_channels_to_visualize(self) -> np.ndarray:
        return self.image_channels.get_channels_to_visualize(image=self.image)

    @property
    def image_as_lab(self) -> np.ndarray:
        return self.image_channels.convert_image_to_lab(image=self.image)

Example of questions

--------------------------------------------------------------------------------
Please describe your image channels?
--------------------------------------------------------------------------------
Image Shape: (640, 427, 3)

Enter the channel format representing your image:

  > RGB  : Red, Green, Blue
  > BGR  : Blue, Green, Red
  > G    : Grayscale
  > LAB  : Luminance, A and B color channels

ADDITIONAL CHANNELS?
If your image contains channels other than the standard ones listed above (e.g., Depth, Heat), prefix them with 'O'. 
For instance:
  > ORGBO: Can represent (Heat, Red, Green, Blue, Depth).
  > OBGR:  Can represent (Alpha, Blue, Green, Red).
  > GO:    Can represent (Gray, Depth).

IMPORTANT: Make sure that your answer represents all the image channels.

Something I explored but gave up on

At first, I started with a more custom solution. You can enter any channel type in any order. OOROBOG for instance, and then DG would understand which channels are Red, Blue, Green and reorganise. The issue is that, it adds lots of complexity in

Final Notes

I know the design is not perfect, I tried to make it as simple as possible while being general enough. I also think I should refine the way questions are being asked. Let me know any thought that comes to your mind

BloodAxe commented 1 year ago

I think it's a good place to start with asking "Why do we ever care about channels layout in images in the first place?"

Perhaps I may miss few use cases, but this is my list of places where we want to have some information on which channel is red, green or blue:

And here are some examples of what input image can be:

Let's assume we have an abstract concept ImageChannelMapping (Name is arbitrary) that we want to encapsulate this knowledge of what each channel represents. Here are the technical requirements for this concept from my perspective:

class ImageChannelMapping(ABC):
  def get_channel_names_and_colors(self) -> List[Tuple[str, Tuple[int,int,int]]: pass
  def get_mean_intensity_image(self, image:np.ndarray) -> np.ndarray: pass)
  def get_rgb_representation(self, image:np.ndarray) -> np.ndarray: pass

This is the internal concept that DG can use to operate on images.

Now how one would obtain this information

Option 1 (most explicit) - User would have to pass this explicitly via code: Sure we can simplify his life by providing a pre-made templates: ImageChannelMapping.RGB, ImageChannelMapping.Grayscale, etc.

But also if some custom channel scheme is required:

mapping = ImageChannelMapping(
  channel_names=["VV","VH","HH"],                      # This is for plots
  channel_colors=[(255,0,0), (10,255,30), (0,40,255)], # This is for plots
  get_rgb_representation = lambda image: cv2.normalize(image[0] + 0.5 * image[1]),
  get_mean_intensity_image = lambda image: np.mean(image, axis=-1)
)

Option 2 (user input) If image has 3 channels, prompt him is it's RGB, BGR, LAB, YCrCB or something else. If image has 1 channel prompt if it's grayscale or something else If image has other number of channels go to something else. something else: Ask user to give some meaningful names to the channels (Or if he don't want name them Channel_0, Channel_1, etc..) In case of something else we warn that get_rgb_representation and get_mean_intensity_image will be some default implementations (Eg. taking first 3 chanels) and that's it.

Louis-Dupont commented 1 year ago

Now we can see the custom channels (not RGB, BGR, G). Next step will be to add support for custom names.

image