Improve example ssd - Githubissues

zhreshold commented 7 years ago

Up to now, there are several issues with example ssd, I'm posting here to track the progress in improving this example in nnvm branch.

[x] Make sure the current example does converge. This is confirmed @piiswrong
[ ] Add test_score.py to allow automatic check for future commits
[ ] Write a new lr_scheduler because initial gradients are not stable, since the current model vgg16 has no batchNorm layer
[ ] Replace data loading/augmentation functions with mx.image, after some experiments, I found this is more important than packing images into sequential file, this will make training faster with many gpus
[ ] Support rec files as input
[ ] Make caffemodel converter available

Any suggestion is very welcome. I will keep this updated.

piiswrong commented 7 years ago

Just to confirm: do you mean it converges on the nnvm branch?

Glad to know that you find mx.image useful. I'm planing to write some tutorials on that. Are you interested in jumping in?

zhreshold commented 7 years ago

Yes, I mean nnvm branch. And I'm absolutely very interested in the tutorial, @piiswrong.

howard0su commented 7 years ago

more suggestions:

update SSD based on the updated paper. The paper report 5% improvement mAP a) update SSD model b) add color distort I had some code change here: https://github.com/howard0su/mxnet/tree/ssdv3_nnvm but didn't finish some other changes like negative mining change.
Add mAP calculation metric this is very useful.
Support other dataset like Kitti
Normalize the implementation of DatIter. We need a standard implementation over detection data input. So that we can leverage existing IO iterators. and data argumenter code can be reused as well.

zhreshold commented 7 years ago

@howard0su Looks good, I will consider them carefully, especially 4 is the one I'm thinking about. Detection problems should reuse the same basics, and it could possibly benefit all existing/future projects.

howard0su commented 7 years ago

@zhreshold Can u propose a design? I can afford some time to help as well.

zhreshold commented 7 years ago

I think mx.image.ImageIter could be a very good starting point to unify the interface of DataIter for object detection problem. The differences/difficulties are:

Label width varies from image to image because # object varies, this have to be solved by padding or special process before loading the labels. Thus rec files must be prepared accordingly, I think it's better to unify this behavior across tasks.
Data augmentations such like lighting/colorjitter/colornomalize can be reused from current functions, however, anything related to spatial transform must be handled differently: augmenter must take in label as well, cropping/flipping image will result in different labels.
as a result of 2, the format of label for object detection tasks should be fixed, so we can always reuse the augmenter functions. Essentially we need labels in formats like this for each image: (im_width, im_height) - required for those using non-fixed size inputs(fast(er)-rcnn, etc) (object_id, xmin, ymin, xmax, ymax) x N - proportional or absolute bounding boxes

Just wondering if you guys ever had plans or ideas like this? @piiswrong @sxjscience @precedenceguo

piiswrong commented 7 years ago

you can pack array as label into rec and each record can have different label length
the crop func etc returns the transformed image along with coordinates. So you can write a wrapper that transform the label

howard0su commented 7 years ago

Data augmentations, are u proposing current spatial transform iterators supporting both "label" data as a vector of bounding box? another possible solution is exposing transform information through another output variable. and build iterator to consume those variables to transform bounding box.

ijkguo commented 7 years ago

The results we have now:

mx.image is useful to speed up training io

The problem:

how to pack detection image into rec
how to address label transformation

From weiliu-ssd we can learn:

lmdb can be used in detection -> rec is possible
complex augmentation is necessary in ssd -> maybe also useful to other methods -> generic detection io is meaningful

santoshmo commented 7 years ago

Adding a deconvolutional augmentation to the current SSD would help as well: https://arxiv.org/pdf/1701.06659v1

Achieves 80.1 mAP on VOC 2007 test and 33.2 mAP on COCO without sacrificing too much in terms of speed. Simple modification of existing feature extractor and additions of deconvolutional network operations at the end of the architecture should improve the SSD.

zhreshold commented 7 years ago

I'm testing a new iterator to allow extensive data augmentation with better speed/api, after that I will try to write multiple symbols to match many variations, including this one.

piiswrong commented 7 years ago

Can we try to unify the data pipeline for faster rnn and ssd?

On Thu, Feb 2, 2017 at 8:13 AM, Joshua Z. Zhang notifications@github.com wrote:

I'm testing a new iterator to allow extensive data augmentation with better speed/api, after that I will try to write multiple symbols to match many variations, including this one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4225#issuecomment-277002225, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiudObtWlQ90_Gj6_psTbdJagCfabrCks5rYgC-gaJpZM4LMDE1 .

zhreshold commented 7 years ago

@piiswrong Can you have a look at this: https://github.com/zhreshold/mxnet/blob/ssd2/python/mxnet/image.py, though it's not fully finished. For rcnn, overriding ImageDetIter.reshape and ImageDetIter.next then it should be good enough for handling mini-batches

piiswrong commented 7 years ago

@precedenceguo @howard0su @andreaolgiati See if you like this https://github.com/zhreshold/mxnet/blob/ssd2/python/mxnet/image.py

andreaolgiati commented 7 years ago

If I read this correctly, line 518 adds support for variable-length label list. That's awesome!

zhreshold commented 7 years ago

@piiswrong I now still have concerns about the performance of augmenters in python end. Here's some tests with ssd example on single gpu(titan x) with a relatively weak cpu(e5 4c4t 2.8g):

cast	mean	bright	contrast	saturation	pca_noise	mirror	rand_crop	rand_pad	sample/s
✓	x	x	x	x	x	x	x	x	31.6
✓	✓	x	x	x	x	x	x	x	31.2
✓	✓	✓	x	x	x	x	x	x	30.34
✓	✓	x	✓	x	x	x	x	x	28.3
✓	✓	x	x	✓	x	x	x	x	29.02
✓	✓	x	x	x	✓	x	x	x	29.93
✓	✓	✓	✓	✓	✓	x	x	x	20.65
✓	✓	x	x	x	x	✓	x	x	30.6
✓	✓	x	x	x	x	x	✓	x	30.56
✓	✓	x	x	x	x	x	x	✓	30.54
✓	✓	x	x	x	x	✓	✓	✓	26.41
✓	✓	✓	✓	✓	✓	✓	✓	✓	16.9

So, brightness+contrast+saturation+pca_noise augmenter could impact the performance a lot, even greater than random cropping and padding(for detection), which is a surprise to me. I also tried threading and it provides no gain, possibly due to GIL. Any suggestion?

piiswrong commented 7 years ago

did you try increasing num cpu workers? Joshua Z. Zhang notifications@github.com于2017年2月6日周一下午1:45写道：

@piiswrong https://github.com/piiswrong I now still have concerns about the performance of augmenters in python end. Here's some tests with ssd example on single gpu(titan x) with a relatively weak cpu(e5 4c4t 2.8g): cast mean bright contrast saturation pca_noise mirror rand_crop rand_pad sample/s ✓ x x x x x x x x 31.6 ✓ ✓ x x x x x x x 31.2 ✓ ✓ ✓ x x x x x x 30.34 ✓ ✓ x ✓ x x x x x 28.3 ✓ ✓ x x ✓ x x x x 29.02 ✓ ✓ x x x ✓ x x x 29.93 ✓ ✓ ✓ ✓ ✓ ✓ x x x 20.65 ✓ ✓ x x x x ✓ x x 30.6 ✓ ✓ x x x x x ✓ x 30.56 ✓ ✓ x x x x x x ✓ 30.54 ✓ ✓ x x x x ✓ ✓ ✓ 26.41 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 16.9

So, brightness+contrast+saturation+pca_noise augmenter could impact the performance a lot, even greater than random cropping and padding(for detection), which is a surprise to me. I also tried threading and it provides no gain, possibly due to GIL. Any suggestion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4225#issuecomment-277823813, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiudEufKaotC9cbQJi1NjPQflPx2JS7ks5rZ5SGgaJpZM4LMDE1 .

zhreshold commented 7 years ago

Well, forgot to do so :sweat:. Instant climb from 16 sample/s to 29.8 sample/s. Thanks

andreaolgiati commented 7 years ago

Might take some work, but I'd also look into using multiprocessing. I have found GIL to be a big pain in the past.

zhreshold commented 7 years ago

@andreaolgiati I was think about multiprocessing as well. However, if pushing time consuming work into mxnet engine works well, there's no reason to do so.

apache / mxnet

Improve example ssd #4225