Open ivanquirino opened 5 years ago
The existing network and loss don't respect rotated bounding boxes, so the best bet is that we can adjust the bounding boxes to rotated objects. This way is doable, but expensive. But just as you have said, it's a nice feature to have.
For image classification, we use Compose in GluonCV, but with object detection, it's getting more complicated, I will rethink about it.
@zhreshold thanks for answering.
Adjusting the bouding boxes is exactly what I was thinking.
I've built a custom transform class myself for my custom training script which is based on the training script on the VOC dataset example, I use Compose transforms on it, because I needed a YOLOv3 transform without random flipping or cropping:
class CustomTransform(object):
def __init__(self, net=None, size=(SIZE, SIZE), mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225), **kwargs):
self._size = size
self._mean = mean
self._std = std
self._img_transform = transforms.Compose([
transforms.Resize(size, keep_ratio=True),
transforms.RandomColorJitter(0.1, 0.1, 0.1, 0.1),
transforms.RandomLighting(0.1),
transforms.ToTensor(),
transforms.Normalize(mean=mean, std=std)
])
self._target_generator = None
if net is None:
return
# in case network has reset_ctx to gpu
self._fake_x = nd.zeros((1, 3, size[0], size[0]))
net = copy.deepcopy(net)
net.collect_params().reset_ctx(None)
with autograd.train_mode():
_, self._anchors, self._offsets, self._feat_maps, _, _, _, _ = net(self._fake_x)
self._target_generator = YOLOV3PrefetchTargetGenerator(
num_class=len(net.classes), **kwargs)
def __call__(self, src, label):
h, w, _ = src.shape
img = self._img_transform(src)
label = bbox.resize(label, (w,h), (self._size[0], self._size[1]))
if self._target_generator is None:
return img, label.astype(img.dtype)
gt_bboxes = nd.array(label[np.newaxis, :, :4])
gt_ids = nd.array(label[np.newaxis, :, 4:5])
gt_mixratio = None
objectness, center_targets, scale_targets, weights, class_targets = self._target_generator(
self._fake_x, self._feat_maps, self._anchors, self._offsets,
gt_bboxes, gt_ids, gt_mixratio)
return (img, objectness[0], center_targets[0], scale_targets[0], weights[0],
class_targets[0], gt_bboxes[0])
I have a problem with my traning script on the validation stage, I get somethng like the following output for both VOCApMetric and VOC07ApMetric:
CLASS_1=NaN CLASS_2=NaN CLASS_3=NaN CLASS_4=99% CLASS_5=98% CLASS_6=96% mAP=97%
I think this problem may be
I would appreciate some input on this since I'm kinda new to ML in general, but I'm looking into this for an object detection application.
The nan's looks suspicious to me, may be you don't have such classes in your validation set?
That's my validation LST file. It contains all classes. pads.val.txt
What's strange is that after training the network can detect correctly every image in the validation set.
Correction: there is one image in the validation set that it wrongly classifies. I think the main problem is in my labeling, every label in classes 3 and 6 relate to an object which there's no left and right but those objects are being labeled as left and right. The other four classes have left and right labeled correctly. The correct way to label is to put classes 3 and 6 as the same class, have have 5 classes instead of 6.
Thanks for your input, @zhreshold , you made me see the error better.
It's nice to have those nice image augmentations in both GluonCV and Gluon APIs, but I am missing rotation utilities, to randomly rotate images and bounding boxes. These would be a nice feature to have.
Another missing thing is that in Gluon we have the nice Compose API, to chain together various transforms in a nice way. It would be cool if the GluonCV transforms could be mixed in the compose API.