facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.66k stars 7.51k forks source link

Cannot train on gray image #1020

Closed Angel-Jia closed 4 years ago

Angel-Jia commented 4 years ago

I installed torch, torchvision, detectron2 via pip. I have set MODEL.PIXEL_MEAN/STD to single-item list like:

cfg.MODEL.PIXEL_MEAN = [103.530]
cfg.MODEL.PIXEL_STD = [1.0]
cfg.INPUT.FORMAT = "L"

But I got this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/usr/local/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 208, in run_step
    data = next(self._data_loader_iter)
  File "/usr/local/lib/python3.7/site-packages/detectron2/data/common.py", line 109, in __iter__
    for d in self.dataset:
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_dat
a
    data.reraise()
  File "/usr/local/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2645, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 1), '|u1')

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_l
oop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/site-packages/detectron2/data/common.py", line 39, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/usr/local/lib/python3.7/site-packages/detectron2/utils/serialize.py", line 23, in __call__
    return self._obj(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/detectron2/data/dataset_mapper.py", line 92, in __call__
    image, transforms = T.apply_transform_gens(self.tfm_gens, image)
  File "/usr/local/lib/python3.7/site-packages/detectron2/data/transforms/transform_gen.py", line 445, in
 apply_transform_gens
    img = tfm.apply_image(img)
  File "/usr/local/lib/python3.7/site-packages/detectron2/data/transforms/transform.py", line 81, in appl
y_image
    pil_image = Image.fromarray(img)
  File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2647, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

It seems has something to do with the image shape. When reading image with format "L", the image array dimension was expanded (line 68 in detection_utils.py):

if format == "L":
    image = np.expand_dims(image, -1)

And when the image array was converted back to PIL.Image in transform.py(line 81), data type error was raised.

rtarquini commented 4 years ago

I plan on doing something similar with my single channel 16-bit single channel dataset. I noticed format 'L' is 8-bit. What the datatype+shape of pil_image, single channel grayscale 0-255?

ppwwyyxx commented 4 years ago

It seems PIL does not support 16bit images, so you'll need to use https://detectron2.readthedocs.io/tutorials/data_loading.html#write-a-custom-dataloader to read the image with other libraries.

We do need to improve the transformation support for gray scale images.

rtarquini commented 4 years ago

Writing a custom dataloader seems pretty straight forward for me. I was just uncertain how to build/train a Faster RCNN model with 16-bit single channel data. I would not be loading a preconfigured model, but training from scratch since my data is not RGB images.

10067 commented 4 years ago

It seems you can convert images to 32 bit to adopt to the default data loader :)

rtarquini commented 4 years ago

Thanks.. that's what I did as a work around.

EvertEt commented 4 years ago

Any updates on the original 8-bit single-channel issue using PIL "L" mode? Is Detectron2 able to handle single-channel or should we fill 3 (?) channels with the same data? Or will this case also require a custom data loader?

Thanks a lot!

leidix commented 4 years ago

Can anyone share their custom dataloader for images other than 3-channels? I am trying to train on 4 channel [R,G,B,nDSM] images but failing. nDSM (normalized digital surface model) is the object height above ground as additional information in my orthophotos.

Also can anyone explain, why I need cfg.MODEL.PIXEL_MEAN and cfg.MODEL.PIXEL_STD?

Thanks!!

ecm200 commented 4 years ago

I am only just picking up Detectron 2, and in fact have limited experience using PyTorch (my main area of experience is Keras / Tensforflow).

I really would be very grateful for some pointers on a problem I am trying to solve using this object detection framework.

My dataset has been converted into COCO type format already to try and simplify loading etc. However, the images I wish to detect objects from are single channel, and I see the default models have 3 channels.

@rtarquini, on the 13th March you made the following comment:

Writing a custom dataloader seems pretty straight forward for me. I was just uncertain how to build/train a Faster RCNN model with 16-bit single channel data. I would not be loading a preconfigured model, but training from scratch since my data is not RGB images.

I understand that a custom dataloader can handle any type of image format, as long as the resulting output of the loader conforms to the input requirements of the model. For example, my images are 12-bit integers, which will to be converted and normalized to expected input format (8-bit integer?), as well as resized to something more suitable as the resolution is currently quite large (2592, 1944).

The images are actually saved as 3 channel PNGs, but with the single channel replicated across the other channels. So I could, perhaps just use the input images as 3 channel replications. Is there any reason this approach would not be valid?

If I were to convert to a single channel, using a custom dataloader, then my expectation is that I would have to modify the networks to expect single channel data. Can this be achieved by specifying the correct format in the CONFIG field INPUT.FORMAT, for example specifying the image format as "L" (8 bit black and white)?

@rtarquini state that you would not be loading a preconfigured model, so is this sufficient or are the changes required the networks more involved? If so, would someone mind giving me some pointers on where to start?

rtarquini commented 4 years ago

@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.

You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.

ecm200 commented 4 years ago

@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.

You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.

@rtarquini Thank you so much Richard, I am very grateful for the pointers. I will give this approach a try with my first approach. Happy to share ideas with you if I find other solutions.

TobiasKutscher commented 4 years ago

@ecm200 I decided to just create a dataloader which creates a 'gray' RGB image by replicating the single channel into the other channels. This inflates the image size which is a problem for me as my images are large, but gets around the problem of having the COCO model consuming 1 channel input. I also am using the pretrained weights for COCO, rather than training from scratch.

You will lose some fidelity in your 12-bit data when converting to 24-bit RGB. 8-bits per channel. My data is 16 bit. So normalize the data over 8 bits, then stuff the other channels.

@rtarquini do you mind sharing your custom data loader function for converting, 1-channel greyscale images to 3-channel images?

rtarquini commented 4 years ago

I would not be able to post code, nor the storage format of my data help you anyway. But you can simply expand your data to 3 8-bit channels after reading it in, and can be used in the dataset.

Using 8-bit data is sub-optimal, as there is a significant loss in fidelity.

raviy0807 commented 3 years ago

@arcticant, are you able to train on 4 channels?

@ppwwyyxx, @vkhalidov , @stepancheg , I am able to write custom data loader to fetch 4 channels of image and provide data loader with required format. However, I am getting following error: File "../python3.7/site-packages/detectron2/modeling/meta_arch/retinanet.py", line 484, in images = [(x - self.pixel_mean) / self.pixel_std for x in images] RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

def mapper(dataset_dict);
    dataset_dict = copy.deepcopy(dataset_dict)  
    # Logic to extract channels and generate 1 extra channel.
    processed_img_path = get_rgb_image_path(dataset_dict["file_name"])
    image = utils.read_image(processed_img_path, format="BGR")
    ch4 = extract_extra_channels(dataset_dict["file_name"])
    tfm = T.ResizeTransform(old_h, old_w, new_h, new_w)
    resize_image = tfm.apply_image(image)
    # R,G,B
    image = np.dstack((resize_image[:,:,2],resize_image[:,:,1],resize_image[:,:,0],ch4))
    image = torch.from_numpy(image.transpose(2, 0, 1))
    annos = [
        utils.transform_instance_annotations(annotation, [tfm], (600,960))
        for annotation in dataset_dict.pop("annotations")
    ]
    return {
       # create the format that the model expects
       "image": image,
       "instances": utils.annotations_to_instances(annos, (600,960)),
       "width": 960,
       "height": 600
    }
Please help me on this.

I am performing object detection using retinanet.
ppwwyyxx commented 3 years ago

@raviy0807 as the original issue suggests you need to change PIXEL_MEAN/STD

Priyanshu-Ganwani09 commented 1 month ago

Is there a way to train a model on a custom dataset using grayscale images (single channel: 1)? I am training the model(configs/COCO-Keypoints/keypoint_rcnn_R_50_FPN_1x.yaml) for keypoint detection, but the loss is not converging

The dataset i have is of 3000 images only