Closed djpecot closed 4 years ago
You can loop over the data loader with for data in data_loader
and visualize them.
tools/visualize_data.py
is an example.
I guess your custom code is not executed. But without instructions of everything you did (in the form of git diff
) I cannot tell.
@ppwwyyxx Thanks for the rapid reply! I figured out a hacky solution for adding data augmentation to the training process. I still haven't figured out how to visualize modified images, but I'll post here with my code once I get it.
I'll describe the augmentation process as best as I can so if someone else needs help they can check my method (using Google Colab)
!rm /content/detectron2_repo/detectron2/data/detection_utils.py
build_transform_gen function
to whatever you want (I put in T.RandomBrightness and T.RandomContrast)
%%writefile /content/detectron2_repo/detectron2/data/detection_utils.py
# -*- coding: utf-8 -*-
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
""" Common data processing utilities that are used in a typical object detection data pipeline. """ import logging import numpy as np import pycocotools.mask as mask_util import torch from fvcore.common.file_io import PathManager from PIL import Image, ImageOps
from detectron2.structures import ( BitMasks, Boxes, BoxMode, Instances, Keypoints, PolygonMasks, RotatedBoxes, polygons_to_bitmask, )
from . import transforms as T from .catalog import MetadataCatalog
class SizeMismatchError(ValueError): """ When loaded image has difference width/height compared with annotation. """
def read_image(file_name, format=None): """ Read an image into the given format. Will apply rotation and flipping if the image has such exif information.
Args:
file_name (str): image file path
format (str): one of the supported image modes in PIL, or "BGR"
Returns:
image (np.ndarray): an HWC image
"""
with PathManager.open(file_name, "rb") as f:
image = Image.open(f)
# capture and ignore this bug: https://github.com/python-pillow/Pillow/issues/3973
try:
image = ImageOps.exif_transpose(image)
except Exception:
pass
if format is not None:
# PIL only supports RGB, so convert to RGB and flip channels over below
conversion_format = format
if format == "BGR":
conversion_format = "RGB"
image = image.convert(conversion_format)
image = np.asarray(image)
if format == "BGR":
# flip channels if needed
image = image[:, :, ::-1]
# PIL squeezes out the channel dimension for "L", so make it HWC
if format == "L":
image = np.expand_dims(image, -1)
return image
def check_image_size(dataset_dict, image): """ Raise an error if the image does not match the size specified in the dict. """ if "width" in dataset_dict or "height" in dataset_dict: image_wh = (image.shape[1], image.shape[0]) expected_wh = (dataset_dict["width"], dataset_dict["height"]) if not image_wh == expected_wh: raise SizeMismatchError( "Mismatched (W,H){}, got {}, expect {}".format( " for image " + dataset_dict["file_name"] if "file_name" in dataset_dict else "", image_wh, expected_wh, ) )
# To ensure bbox always remap to original image size
if "width" not in dataset_dict:
dataset_dict["width"] = image.shape[1]
if "height" not in dataset_dict:
dataset_dict["height"] = image.shape[0]
def transform_proposals(dataset_dict, image_shape, transforms, min_box_side_len, proposal_topk): """ Apply transformations to the proposals in dataset_dict, if any.
Args:
dataset_dict (dict): a dict read from the dataset, possibly
contains fields "proposal_boxes", "proposal_objectness_logits", "proposal_bbox_mode"
image_shape (tuple): height, width
transforms (TransformList):
min_box_side_len (int): keep proposals with at least this size
proposal_topk (int): only keep top-K scoring proposals
The input dict is modified in-place, with abovementioned keys removed. A new
key "proposals" will be added. Its value is an `Instances`
object which contains the transformed proposals in its field
"proposal_boxes" and "objectness_logits".
"""
if "proposal_boxes" in dataset_dict:
# Transform proposal boxes
boxes = transforms.apply_box(
BoxMode.convert(
dataset_dict.pop("proposal_boxes"),
dataset_dict.pop("proposal_bbox_mode"),
BoxMode.XYXY_ABS,
)
)
boxes = Boxes(boxes)
objectness_logits = torch.as_tensor(
dataset_dict.pop("proposal_objectness_logits").astype("float32")
)
boxes.clip(image_shape)
keep = boxes.nonempty(threshold=min_box_side_len)
boxes = boxes[keep]
objectness_logits = objectness_logits[keep]
proposals = Instances(image_shape)
proposals.proposal_boxes = boxes[:proposal_topk]
proposals.objectness_logits = objectness_logits[:proposal_topk]
dataset_dict["proposals"] = proposals
def transform_instance_annotations( annotation, transforms, image_size, *, keypoint_hflip_indices=None ): """ Apply transforms to box, segmentation and keypoints annotations of a single instance.
It will use `transforms.apply_box` for the box, and
`transforms.apply_coords` for segmentation polygons & keypoints.
If you need anything more specially designed for each data structure,
you'll need to implement your own version of this function or the transforms.
Args:
annotation (dict): dict of instance annotations for a single instance.
It will be modified in-place.
transforms (TransformList):
image_size (tuple): the height, width of the transformed image
keypoint_hflip_indices (ndarray[int]): see `create_keypoint_hflip_indices`.
Returns:
dict:
the same input dict with fields "bbox", "segmentation", "keypoints"
transformed according to `transforms`.
The "bbox_mode" field will be set to XYXY_ABS.
"""
bbox = BoxMode.convert(annotation["bbox"], annotation["bbox_mode"], BoxMode.XYXY_ABS)
# Note that bbox is 1d (per-instance bounding box)
annotation["bbox"] = transforms.apply_box([bbox])[0]
annotation["bbox_mode"] = BoxMode.XYXY_ABS
if "segmentation" in annotation:
# each instance contains 1 or more polygons
segm = annotation["segmentation"]
if isinstance(segm, list):
# polygons
polygons = [np.asarray(p).reshape(-1, 2) for p in segm]
annotation["segmentation"] = [
p.reshape(-1) for p in transforms.apply_polygons(polygons)
]
elif isinstance(segm, dict):
# RLE
mask = mask_util.decode(segm)
mask = transforms.apply_segmentation(mask)
assert tuple(mask.shape[:2]) == image_size
annotation["segmentation"] = mask
else:
raise ValueError(
"Cannot transform segmentation of type '{}'!"
"Supported types are: polygons as list[list[float] or ndarray],"
" COCO-style RLE as a dict.".format(type(segm))
)
if "keypoints" in annotation:
keypoints = transform_keypoint_annotations(
annotation["keypoints"], transforms, image_size, keypoint_hflip_indices
)
annotation["keypoints"] = keypoints
return annotation
def transform_keypoint_annotations(keypoints, transforms, image_size, keypoint_hflip_indices=None): """ Transform keypoint annotations of an image.
Args:
keypoints (list[float]): Nx3 float in Detectron2 Dataset format.
transforms (TransformList):
image_size (tuple): the height, width of the transformed image
keypoint_hflip_indices (ndarray[int]): see `create_keypoint_hflip_indices`.
"""
# (N*3,) -> (N, 3)
keypoints = np.asarray(keypoints, dtype="float64").reshape(-1, 3)
keypoints[:, :2] = transforms.apply_coords(keypoints[:, :2])
# This assumes that HorizFlipTransform is the only one that does flip
do_hflip = sum(isinstance(t, T.HFlipTransform) for t in transforms.transforms) % 2 == 1
# Alternative way: check if probe points was horizontally flipped.
# probe = np.asarray([[0.0, 0.0], [image_width, 0.0]])
# probe_aug = transforms.apply_coords(probe.copy())
# do_hflip = np.sign(probe[1][0] - probe[0][0]) != np.sign(probe_aug[1][0] - probe_aug[0][0]) # noqa
# If flipped, swap each keypoint with its opposite-handed equivalent
if do_hflip:
assert keypoint_hflip_indices is not None
keypoints = keypoints[keypoint_hflip_indices, :]
# Maintain COCO convention that if visibility == 0, then x, y = 0
# TODO may need to reset visibility for cropped keypoints,
# but it does not matter for our existing algorithms
keypoints[keypoints[:, 2] == 0] = 0
return keypoints
def annotations_to_instances(annos, image_size, mask_format="polygon"):
"""
Create an :class:Instances
object used by the models,
from instance annotations in the dataset dict.
Args:
annos (list[dict]): a list of instance annotations in one image, each
element for one instance.
image_size (tuple): height, width
Returns:
Instances:
It will contain fields "gt_boxes", "gt_classes",
"gt_masks", "gt_keypoints", if they can be obtained from `annos`.
This is the format that builtin models expect.
"""
boxes = [BoxMode.convert(obj["bbox"], obj["bbox_mode"], BoxMode.XYXY_ABS) for obj in annos]
target = Instances(image_size)
boxes = target.gt_boxes = Boxes(boxes)
boxes.clip(image_size)
classes = [obj["category_id"] for obj in annos]
classes = torch.tensor(classes, dtype=torch.int64)
target.gt_classes = classes
if len(annos) and "segmentation" in annos[0]:
segms = [obj["segmentation"] for obj in annos]
if mask_format == "polygon":
masks = PolygonMasks(segms)
else:
assert mask_format == "bitmask", mask_format
masks = []
for segm in segms:
if isinstance(segm, list):
# polygon
masks.append(polygons_to_bitmask(segm, *image_size))
elif isinstance(segm, dict):
# COCO RLE
masks.append(mask_util.decode(segm))
elif isinstance(segm, np.ndarray):
assert segm.ndim == 2, "Expect segmentation of 2 dimensions, got {}.".format(
segm.ndim
)
# mask array
masks.append(segm)
else:
raise ValueError(
"Cannot convert segmentation of type '{}' to BitMasks!"
"Supported types are: polygons as list[list[float] or ndarray],"
" COCO-style RLE as a dict, or a full-image segmentation mask "
"as a 2D ndarray.".format(type(segm))
)
# torch.from_numpy does not support array with negative stride.
masks = BitMasks(
torch.stack([torch.from_numpy(np.ascontiguousarray(x)) for x in masks])
)
target.gt_masks = masks
if len(annos) and "keypoints" in annos[0]:
kpts = [obj.get("keypoints", []) for obj in annos]
target.gt_keypoints = Keypoints(kpts)
return target
def annotations_to_instances_rotated(annos, image_size):
"""
Create an :class:Instances
object used by the models,
from instance annotations in the dataset dict.
Compared to annotations_to_instances
, this function is for rotated boxes only
Args:
annos (list[dict]): a list of instance annotations in one image, each
element for one instance.
image_size (tuple): height, width
Returns:
Instances:
Containing fields "gt_boxes", "gt_classes",
if they can be obtained from `annos`.
This is the format that builtin models expect.
"""
boxes = [obj["bbox"] for obj in annos]
target = Instances(image_size)
boxes = target.gt_boxes = RotatedBoxes(boxes)
boxes.clip(image_size)
classes = [obj["category_id"] for obj in annos]
classes = torch.tensor(classes, dtype=torch.int64)
target.gt_classes = classes
return target
def filter_empty_instances(instances, by_box=True, by_mask=True):
"""
Filter out empty instances in an Instances
object.
Args:
instances (Instances):
by_box (bool): whether to filter out instances with empty boxes
by_mask (bool): whether to filter out instances with empty masks
Returns:
Instances: the filtered instances.
"""
assert by_box or by_mask
r = []
if by_box:
r.append(instances.gt_boxes.nonempty())
if instances.has("gt_masks") and by_mask:
r.append(instances.gt_masks.nonempty())
# TODO: can also filter visible keypoints
if not r:
return instances
m = r[0]
for x in r[1:]:
m = m & x
return instances[m]
def create_keypoint_hflip_indices(dataset_names): """ Args: dataset_names (list[str]): list of dataset names Returns: ndarray[int]: a vector of size=#keypoints, storing the horizontally-flipped keypoint indices. """
check_metadata_consistency("keypoint_names", dataset_names)
check_metadata_consistency("keypoint_flip_map", dataset_names)
meta = MetadataCatalog.get(dataset_names[0])
names = meta.keypoint_names
# TODO flip -> hflip
flip_map = dict(meta.keypoint_flip_map)
flip_map.update({v: k for k, v in flip_map.items()})
flipped_names = [i if i not in flip_map else flip_map[i] for i in names]
flip_indices = [names.index(i) for i in flipped_names]
return np.asarray(flip_indices)
def gen_crop_transform_with_instance(crop_size, image_size, instance): """ Generate a CropTransform so that the cropping region contains the center of the given instance.
Args:
crop_size (tuple): h, w in pixels
image_size (tuple): h, w
instance (dict): an annotation dict of one instance, in Detectron2's
dataset format.
"""
crop_size = np.asarray(crop_size, dtype=np.int32)
bbox = BoxMode.convert(instance["bbox"], instance["bbox_mode"], BoxMode.XYXY_ABS)
center_yx = (bbox[1] + bbox[3]) * 0.5, (bbox[0] + bbox[2]) * 0.5
assert (
image_size[0] >= center_yx[0] and image_size[1] >= center_yx[1]
), "The annotation bounding box is outside of the image!"
assert (
image_size[0] >= crop_size[0] and image_size[1] >= crop_size[1]
), "Crop size is larger than image size!"
min_yx = np.maximum(np.floor(center_yx).astype(np.int32) - crop_size, 0)
max_yx = np.maximum(np.asarray(image_size, dtype=np.int32) - crop_size, 0)
max_yx = np.minimum(max_yx, np.ceil(center_yx).astype(np.int32))
y0 = np.random.randint(min_yx[0], max_yx[0] + 1)
x0 = np.random.randint(min_yx[1], max_yx[1] + 1)
return T.CropTransform(x0, y0, crop_size[1], crop_size[0])
def check_metadata_consistency(key, dataset_names): """ Check that the datasets have consistent metadata.
Args:
key (str): a metadata key
dataset_names (list[str]): a list of dataset names
Raises:
AttributeError: if the key does not exist in the metadata
ValueError: if the given datasets do not have the same metadata values defined by key
"""
if len(dataset_names) == 0:
return
logger = logging.getLogger(__name__)
entries_per_dataset = [getattr(MetadataCatalog.get(d), key) for d in dataset_names]
for idx, entry in enumerate(entries_per_dataset):
if entry != entries_per_dataset[0]:
logger.error(
"Metadata '{}' for dataset '{}' is '{}'".format(key, dataset_names[idx], str(entry))
)
logger.error(
"Metadata '{}' for dataset '{}' is '{}'".format(
key, dataset_names[0], str(entries_per_dataset[0])
)
)
raise ValueError("Datasets have different metadata '{}'!".format(key))
def build_transform_gen(cfg, is_train):
"""
Create a list of :class:TransformGen
from config.
Now it includes resizing and flipping.
Returns:
list[TransformGen]
"""
if is_train:
min_size = cfg.INPUT.MIN_SIZE_TRAIN
max_size = cfg.INPUT.MAX_SIZE_TRAIN
sample_style = cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING
else:
min_size = cfg.INPUT.MIN_SIZE_TEST
max_size = cfg.INPUT.MAX_SIZE_TEST
sample_style = "choice"
if sample_style == "range":
assert len(min_size) == 2, "more than 2 ({}) min_size(s) are provided for ranges".format(
len(min_size)
)
logger = logging.getLogger(__name__)
tfm_gens = []
tfm_gens.append(T.ResizeShortestEdge(min_size, max_size, sample_style))
if is_train:
tfm_gens.append(T.RandomFlip())
tfm_gens.append(T.RandomBrightness(0.9,1.1))
tfm_gens.append(T.RandomContrast(0.9, 1.1))
logger.info("TransformGens used in training: " + str(tfm_gens))
return tfm_gens
3. Now restart runtime. Once you start training, you should get the following message:
[02/03 04:41:58 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip(), RandomBrightness(intensity_min=0.9, intensity_max=1.1), RandomContrast(intensity_min=0.9, intensity_max=1.1)] [02/03 04:41:58 d2.data.build]: Using training sampler TrainingSampler
Also, I apologize for my poor programming skills. I'm not sure how to display git diff, comparing original and modified source code. Something I will look into, thanks for your help and feedback!
You have installed the code to a different location and are not modifying the code that will be run.
Just wanted to mention that the reason why @djpecot doesn't see the transformations listed is because the printout that they're reading is not from the mapper but from the d2.data.detection_utils.build_transform_gen function which only includes the Resize and Crop by default.
If you want it to print all transforms, you have to modify the list of transformations in build_transform_gen and then use your custom mapper to call that function instead. I hope that clears up the misunderstanding.
@djpecot did claim to have modified build_transform_gen
as shown by his code. So it's a different issue.
The recommended way is to not modify anything but just call a new transform in your own mapper (https://detectron2.readthedocs.io/tutorials/data_loading.html#write-a-custom-dataloader).
@gessha Following up on your comment, the build_transform_gen function adds the Resize and Crop to the logger. So does this mean that no matter the augmentations we apply with a custom mapper never get added to the logger? If that is the case why don't the transformation get added to the logger in the apply_transform_gens function?
I wrote a custom mapper and included my own augmentations. When I begin training now there is no output that says which transformations are being applied. Should I assume that this means that build_transform_gen isn't being used any longer and that my custom augmentations are being applied? Thanks
The logs are from build_transform_gen
so you'll see them if you call build_transform_gen
in your custom mapper. You'll not see them if you don't call. You can log your transformations manually if you want them to be logged.
Thanks for the info. I'm having a little trouble using the logger. I tried to do it the same way as in the build_transform_gen
function in detectron2 but it isn't behaving the way I expected.
I wrote a custom data mapper using this code:
def custom_mapper(input_dict):
dataset_dict = copy.deepcopy(input_dict) # it will be modified by code below
image = utils.read_image(dataset_dict["file_name"], format="BGR")
transform_list = [T.Resize((1200,1200)),
T.RandomFlip(prob=0.6, horizontal=True, vertical=False),
T.RandomFlip(prob=0.6, horizontal=False, vertical=True),
T.RandomContrast(0.7, 3.2),
T.RandomBrightness(0.6, 1.8),
]
image, transforms = T.apply_transform_gens(transform_list, image)
dataset_dict["image"] = torch.as_tensor(image.transpose(2, 0, 1).astype("float32"))
annos = [
utils.transform_instance_annotations(obj, transforms, image.shape[:2])
for obj in dataset_dict.pop("annotations")
]
instances = utils.annotations_to_instances(annos, image.shape[:2])
dataset_dict["instances"] = utils.filter_empty_instances(instances)
return dataset_dict
However, when I run the training session I still get:
[04/07 19:07:44 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800,), max_size=1333, sample_style='choice'), RandomFlip()]
Which makes it seem like the custom mapper isn't being called. But as a test I put a print statement inside the custom mapper and that prints out during the training, which makes it seem like it is being called. So then I thought I might need to update the log, maybe that was the issue. So I added import logging to the top of the cell. Then I added the lines:
logger = logging.getLogger(__name__)
logger.info("TranformGens used in training: "+ str(transform_list))
But still it tells me that the TransformGens being applied are just ResizeShortestEdge and RandomFlip. Am I just updating the logger incorrectly? Where am I going wrong? Thanks
❓ What is the best way to verify/visualize images are being augmented from custom dataloader (in Google Colab)?
Following this tutorial: https://detectron2.readthedocs.io/tutorials/data_loading.html, I wrote a custom dataloader. I am working with X-ray images, so my goal is to jitter the lighting parameters (contrast and brightness) as would be expected in X-rays. I wrote the dataloader like so:
After defining the mapper, I opened the defaults.py file, and replaced line 430 with this:
After saving this file, building my cfg file and running the default training process:
I expect for my model to list the brightness and contrast augmentations under TransformGens, but it only shows ResizeShortestEdge and RandomFlip:
My first question is what am I doing wrong?
Second, how can I verify that my images are in fact being augmented? I took the course fast.ai and found they had a built in method that calls and visualizes training set images, first pulling them through the image augmentation pipeline. Is there something similar here?