Closed Angel-Jia closed 4 years ago
I plan on doing something similar with my single channel 16-bit single channel dataset. I noticed format 'L' is 8-bit. What the datatype+shape of pil_image, single channel grayscale 0-255?
It seems PIL does not support 16bit images, so you'll need to use https://detectron2.readthedocs.io/tutorials/data_loading.html#write-a-custom-dataloader to read the image with other libraries.
We do need to improve the transformation support for gray scale images.
Writing a custom dataloader seems pretty straight forward for me. I was just uncertain how to build/train a Faster RCNN model with 16-bit single channel data. I would not be loading a preconfigured model, but training from scratch since my data is not RGB images.
It seems you can convert images to 32 bit to adopt to the default data loader :)
Thanks.. that's what I did as a work around.
Any updates on the original 8-bit single-channel issue using PIL "L" mode? Is Detectron2 able to handle single-channel or should we fill 3 (?) channels with the same data? Or will this case also require a custom data loader?
Thanks a lot!
Can anyone share their custom dataloader for images other than 3-channels? I am trying to train on 4 channel [R,G,B,nDSM] images but failing. nDSM (normalized digital surface model) is the object height above ground as additional information in my orthophotos.
Also can anyone explain, why I need cfg.MODEL.PIXEL_MEAN and cfg.MODEL.PIXEL_STD?
Thanks!!
I am only just picking up Detectron 2, and in fact have limited experience using PyTorch (my main area of experience is Keras / Tensforflow).
I really would be very grateful for some pointers on a problem I am trying to solve using this object detection framework.
My dataset has been converted into COCO type format already to try and simplify loading etc. However, the images I wish to detect objects from are single channel, and I see the default models have 3 channels.
@rtarquini, on the 13th March you made the following comment:
Writing a custom dataloader seems pretty straight forward for me. I was just uncertain how to build/train a Faster RCNN model with 16-bit single channel data. I would not be loading a preconfigured model, but training from scratch since my data is not RGB images.
I understand that a custom dataloader can handle any type of image format, as long as the resulting output of the loader conforms to the input requirements of the model. For example, my images are 12-bit integers, which will to be converted and normalized to expected input format (8-bit integer?), as well as resized to something more suitable as the resolution is currently quite large (2592, 1944).
The images are actually saved as 3 channel PNGs, but with the single channel replicated across the other channels. So I could, perhaps just use the input images as 3 channel replications. Is there any reason this approach would not be valid?
If I were to convert to a single channel, using a custom dataloader, then my expectation is that I would have to modify the networks to expect single channel data. Can this be achieved by specifying the correct format in the CONFIG field INPUT.FORMAT, for example specifying the image format as "L" (8 bit black and white)?
@rtarquini state that you would not be loading a preconfigured model, so is this sufficient or are the changes required the networks more involved? If so, would someone mind giving me some pointers on where to start?
@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.
You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.
@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.
You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.
@rtarquini Thank you so much Richard, I am very grateful for the pointers. I will give this approach a try with my first approach. Happy to share ideas with you if I find other solutions.
@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.
You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.
@rtarquini do you mind sharing your custom data loader function for converting, 1-channel greyscale images to 3-channel images?
I would not be able to post code, nor the storage format of my data help you anyway. But you can simply expand your data to 3 8-bit channels after reading it in, and can be used in the dataset.
Using 8-bit data is sub-optimal, as there is a significant loss in fidelity.
@arcticant, are you able to train on 4 channels?
@ppwwyyxx, @vkhalidov , @stepancheg , I am able to write custom data loader to fetch 4 channels of image and provide data loader with required format. However, I am getting following error:
File "../python3.7/site-packages/detectron2/modeling/meta_arch/retinanet.py", line 484, in
def mapper(dataset_dict);
dataset_dict = copy.deepcopy(dataset_dict)
# Logic to extract channels and generate 1 extra channel.
processed_img_path = get_rgb_image_path(dataset_dict["file_name"])
image = utils.read_image(processed_img_path, format="BGR")
ch4 = extract_extra_channels(dataset_dict["file_name"])
tfm = T.ResizeTransform(old_h, old_w, new_h, new_w)
resize_image = tfm.apply_image(image)
# R,G,B
image = np.dstack((resize_image[:,:,2],resize_image[:,:,1],resize_image[:,:,0],ch4))
image = torch.from_numpy(image.transpose(2, 0, 1))
annos = [
utils.transform_instance_annotations(annotation, [tfm], (600,960))
for annotation in dataset_dict.pop("annotations")
]
return {
# create the format that the model expects
"image": image,
"instances": utils.annotations_to_instances(annos, (600,960)),
"width": 960,
"height": 600
}
Please help me on this.
I am performing object detection using retinanet.
@raviy0807 as the original issue suggests you need to change PIXEL_MEAN/STD
Is there a way to train a model on a custom dataset using grayscale images (single channel: 1)? I am training the model(configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x.yaml) for keypoint detection, but the loss is not converging
The dataset i have is of 3000 images only
I installed torch, torchvision, detectron2 via pip. I have set MODEL.PIXEL_MEAN/STD to single-item list like:
But I got this error:
It seems has something to do with the image shape. When reading image with format "L", the image array dimension was expanded (line 68 in
detection_utils.py
):And when the image array was converted back to PIL.Image in
transform.py
(line 81), data type error was raised.