Closed tugot17 closed 2 years ago
Excellent request!
I agree that we should work to support other formats of images that are not RGB, this feature is not yet implemented but the code is flexible enough that not many changes should be needed to add this feature.
would it be possible to alternatively return an image itself
It's definitely possible to do this but it might not be what we want, the records returned by the parser are kept on memory, this would mean all images would be stored on RAM (which might only work for small datasets).
Instead what we need to do is open the image only when creating the batch, so after the batch is used by the model the memory gets cleaned and reused by the consecutive batch.
since sometimes infrared images are in seprate files so some form of manual merging of the images should be conducted
So maybe what we should do, is implement a method that returns all of those paths, and do this merging at batch creation time
Do you have experience training with infrared images in other frameworks?
Do you know if torchvisions faster_rcnn supports that out of the box? I'll have to take a deeper look at that
I guess that it is not much different then training on RGB images.
One solution would be to add additional channel in the input layer, similarly to as it is done in this efficientnet implementation, so to replace the first layer with Conv of different size, and treat infrared as just an additional channel
if in_channels != 3:
Conv2d = get_same_padding_conv2d(image_size=self._global_params.image_size)
out_channels = round_filters(32, self._global_params)
self._conv_stem = Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)
Not enough use cases to justify implementation.
Just want to chime in that there are lots of use cases! Part of the reason that I think folks aren't chiming in is that it's a bit of a chicken and the egg problem. Since there is no framework that adequately supports more than 3 bands for input, no pretrained models are available, so no easily workable examples are available. Therefore very few people from the remote sensing, earth science, hydrology, or ecology communities gets interested in computer vision.
Public satellite data from Landsat and Sentinel-2 provide over 8 bands of spectral data, with shortwave infrared, temperature (longwave infrared), and now even color bands like yellow: https://landsat.gsfc.nasa.gov/satellites/landsat-9/
These and other satellite data can be used for agriculture monitoring, building detection, mapping of habitat, etc. https://www.esa.int/Enabling_Support/Operations/Sentinel-2_operations#:~:text=As%20well%20as%20monitoring%20plant,in%20lakes%20and%20coastal%20waters.
I'm personally working on using multiple bands including radar to map oil pollution in oceans, and trying to use icevision to do so. So far I'm starting with three bands but it would be amazing to have a framework that isn't restricted to 3 bands and I'd love to learn more about what it would take to support this if others are interested!
And even beyond earth science, there's plenty of other use cases in medical imaging and geology where more than 3 bands are useful. Geology in particular often makes use of 100s of bands (hyperspectral data) https://www.usgs.gov/centers/geology%2C-geophysics%2C-and-geochemistry-science-center/science/science-topics/hyperspectral
🚀 Feature
Is your feature request related to a problem? Please describe. Is is possible to use images with more then 3 RGB channels for training? I mean e.g. using the infrared light as an additional channel?
My first guess is not, since in Parser we return
file_path
not the image?, but maybe I'm wrong?Describe the solution you'd like
Maybe instead of returning the image path, would it be possible to alternatively return an image itself (since sometimes infrared images are in seprate files so some form of manual merging of the images should be conducted)?
Additional context This would enable using the ice vision to the wider variety of tasks, especially the real-world, industrial applications