apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Improve example ssd #4225

Closed zhreshold closed 7 years ago

zhreshold commented 7 years ago

Up to now, there are several issues with example ssd, I'm posting here to track the progress in improving this example in nnvm branch.

Any suggestion is very welcome. I will keep this updated.

piiswrong commented 7 years ago

Just to confirm: do you mean it converges on the nnvm branch?

Glad to know that you find mx.image useful. I'm planing to write some tutorials on that. Are you interested in jumping in?

zhreshold commented 7 years ago

Yes, I mean nnvm branch. And I'm absolutely very interested in the tutorial, @piiswrong.

howard0su commented 7 years ago

more suggestions:

  1. update SSD based on the updated paper. The paper report 5% improvement mAP a) update SSD model b) add color distort I had some code change here: https://github.com/howard0su/mxnet/tree/ssdv3_nnvm but didn't finish some other changes like negative mining change.

  2. Add mAP calculation metric this is very useful.

  3. Support other dataset like Kitti

  4. Normalize the implementation of DatIter. We need a standard implementation over detection data input. So that we can leverage existing IO iterators. and data argumenter code can be reused as well.

zhreshold commented 7 years ago

@howard0su Looks good, I will consider them carefully, especially 4 is the one I'm thinking about. Detection problems should reuse the same basics, and it could possibly benefit all existing/future projects.

howard0su commented 7 years ago

@zhreshold Can u propose a design? I can afford some time to help as well.

zhreshold commented 7 years ago

I think mx.image.ImageIter could be a very good starting point to unify the interface of DataIter for object detection problem. The differences/difficulties are:

  1. Label width varies from image to image because # object varies, this have to be solved by padding or special process before loading the labels. Thus rec files must be prepared accordingly, I think it's better to unify this behavior across tasks.

  2. Data augmentations such like lighting/colorjitter/colornomalize can be reused from current functions, however, anything related to spatial transform must be handled differently: augmenter must take in label as well, cropping/flipping image will result in different labels.

  3. as a result of 2, the format of label for object detection tasks should be fixed, so we can always reuse the augmenter functions. Essentially we need labels in formats like this for each image: (im_width, im_height) - required for those using non-fixed size inputs(fast(er)-rcnn, etc) (object_id, xmin, ymin, xmax, ymax) x N - proportional or absolute bounding boxes

Just wondering if you guys ever had plans or ideas like this? @piiswrong @sxjscience @precedenceguo

piiswrong commented 7 years ago
  1. you can pack array as label into rec and each record can have different label length
  2. the crop func etc returns the transformed image along with coordinates. So you can write a wrapper that transform the label
howard0su commented 7 years ago

Data augmentations, are u proposing current spatial transform iterators supporting both "label" data as a vector of bounding box? another possible solution is exposing transform information through another output variable. and build iterator to consume those variables to transform bounding box.

ijkguo commented 7 years ago

The results we have now:

The problem:

From weiliu-ssd we can learn:

santoshmo commented 7 years ago

Adding a deconvolutional augmentation to the current SSD would help as well: https://arxiv.org/pdf/1701.06659v1

Achieves 80.1 mAP on VOC 2007 test and 33.2 mAP on COCO without sacrificing too much in terms of speed. Simple modification of existing feature extractor and additions of deconvolutional network operations at the end of the architecture should improve the SSD.

screen shot 2017-02-01 at 3 12 06 pm
zhreshold commented 7 years ago

I'm testing a new iterator to allow extensive data augmentation with better speed/api, after that I will try to write multiple symbols to match many variations, including this one.

piiswrong commented 7 years ago

Can we try to unify the data pipeline for faster rnn and ssd?

On Thu, Feb 2, 2017 at 8:13 AM, Joshua Z. Zhang notifications@github.com wrote:

I'm testing a new iterator to allow extensive data augmentation with better speed/api, after that I will try to write multiple symbols to match many variations, including this one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4225#issuecomment-277002225, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiudObtWlQ90_Gj6_psTbdJagCfabrCks5rYgC-gaJpZM4LMDE1 .

zhreshold commented 7 years ago

@piiswrong Can you have a look at this: https://github.com/zhreshold/mxnet/blob/ssd2/python/mxnet/image.py, though it's not fully finished. For rcnn, overriding ImageDetIter.reshape and ImageDetIter.next then it should be good enough for handling mini-batches

piiswrong commented 7 years ago

@precedenceguo @howard0su @andreaolgiati See if you like this https://github.com/zhreshold/mxnet/blob/ssd2/python/mxnet/image.py

andreaolgiati commented 7 years ago

If I read this correctly, line 518 adds support for variable-length label list. That's awesome!

zhreshold commented 7 years ago

@piiswrong I now still have concerns about the performance of augmenters in python end. Here's some tests with ssd example on single gpu(titan x) with a relatively weak cpu(e5 4c4t 2.8g):

cast mean bright contrast saturation pca_noise mirror rand_crop rand_pad sample/s
x x x x x x x x 31.6
x x x x x x x 31.2
x x x x x x 30.34
x x x x x x 28.3
x x x x x x 29.02
x x x x x x 29.93
x x x 20.65
x x x x x x 30.6
x x x x x x 30.56
x x x x x x 30.54
x x x x 26.41
16.9

So, brightness+contrast+saturation+pca_noise augmenter could impact the performance a lot, even greater than random cropping and padding(for detection), which is a surprise to me. I also tried threading and it provides no gain, possibly due to GIL. Any suggestion?

piiswrong commented 7 years ago

did you try increasing num cpu workers? Joshua Z. Zhang notifications@github.com于2017年2月6日 周一下午1:45写道:

@piiswrong https://github.com/piiswrong I now still have concerns about the performance of augmenters in python end. Here's some tests with ssd example on single gpu(titan x) with a relatively weak cpu(e5 4c4t 2.8g): cast mean bright contrast saturation pca_noise mirror rand_crop rand_pad sample/s ✓ x x x x x x x x 31.6 ✓ ✓ x x x x x x x 31.2 ✓ ✓ ✓ x x x x x x 30.34 ✓ ✓ x ✓ x x x x x 28.3 ✓ ✓ x x ✓ x x x x 29.02 ✓ ✓ x x x ✓ x x x 29.93 ✓ ✓ ✓ ✓ ✓ ✓ x x x 20.65 ✓ ✓ x x x x ✓ x x 30.6 ✓ ✓ x x x x x ✓ x 30.56 ✓ ✓ x x x x x x ✓ 30.54 ✓ ✓ x x x x ✓ ✓ ✓ 26.41 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 16.9

So, brightness+contrast+saturation+pca_noise augmenter could impact the performance a lot, even greater than random cropping and padding(for detection), which is a surprise to me. I also tried threading and it provides no gain, possibly due to GIL. Any suggestion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4225#issuecomment-277823813, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiudEufKaotC9cbQJi1NjPQflPx2JS7ks5rZ5SGgaJpZM4LMDE1 .

zhreshold commented 7 years ago

Well, forgot to do so :sweat:. Instant climb from 16 sample/s to 29.8 sample/s. Thanks

andreaolgiati commented 7 years ago

Might take some work, but I'd also look into using multiprocessing. I have found GIL to be a big pain in the past.

zhreshold commented 7 years ago

@andreaolgiati I was think about multiprocessing as well. However, if pushing time consuming work into mxnet engine works well, there's no reason to do so.