apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Mxnet very slow - running the ssd_512_mobilenet1.0_voc model on Windows 10 #19242

Closed moseswmwong closed 4 years ago

moseswmwong commented 4 years ago

Description

Installation stem from this link

Mxnet runs very slow on my Windows 10 machine. It is 64 bits system:

Installations inside Windows, as I strictly followed all instruction possible with careful check every milestone.

Installation inside anaconda environment, as a new conda environment is created here are the modules

Checks all passed:

check1
Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
  import cv2
  print (cv2.__version__)
4.4.0

check2
Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
  import mxnet
  print (mxnet.__version__)
1.7.0 

check3
Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
  import mxnet
  import gluoncv
  print (gluoncv.__version__)
0.8.0

check4
Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
  import mxnet as mx
  a = mx.nd.ones((2,3), mx.gpu())
  b = a * 2 + 1
  b.asnumpy()
  array([[3., 3., 3.],
       [3., 3., 3.]], dtype=float32)

check5
Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
  import mxnet
  from mxnet.runtime import feature_list
  feature_list()
[✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✖ CPU_SSE, ✖ CPU_SSE2, ✖ CPU_SSE3, ✖ CPU_SSE4_1, ✖ CPU_SSE4_2, ✖ CPU_SSE4A, ✖ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✖ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]

Error Message

No error message

Problem is that the following code runs about 0.5 frame per second on a 480p youtube MP4 video.

import gluoncv as gcv
from gluoncv.utils import try_import_cv2
cv2 = try_import_cv2()
import mxnet as mx

def gpu_device(gpu_number=0):
    try:
        _ = mx.nd.array([1, 2, 3], ctx=mx.gpu(gpu_number))
    except mx.MXNetError:
        return False
    _ = mx.gpu(gpu_number)
    return True
if not gpu_device():
    print('No GPU device found!')
    exit() 

net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_voc', pretrained=True)
# Compile the model for faster speed
net.hybridize()

cap = cv2.VideoCapture('video.mp4')
time.sleep(1)

axes = None

...

    # Load frame from the camera
    ret, frame = cap.read()

    # Image pre-processing
    frame = mx.nd.array(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)).astype('uint8')
    rgb_nd, frame = gcv.data.transforms.presets.ssd.transform_test(frame, short=512, max_size=700)

    # Run frame through network
    class_IDs, scores, bounding_boxes = net(rgb_nd)

    # Display the result
    img = gcv.utils.viz.cv_plot_bbox(frame, bounding_boxes[0], scores[0], class_IDs[0], class_names=net.classes)
    gcv.utils.viz.cv_plot_image(img)

 ...

SSD (512x512) should be able to do about 10 fps with GPU.

Look at Windows task manager report

Process name: Python
CPU: 16% (out of total CPU utilization 22%)
Memory: 278 MB (occupied 45% of total memory usage)
GPU: 0% (total usage 0%)
GPU engine: 

Obviously GPU is not working for MXnet, but as check5 shown CUDA, cuDNN features are all availagble to Mxnet, and the check 4 "a = mx.nd.ones((2,3), mx.gpu())" proved it is using gpu.

To Reproduce

please see above

Steps to reproduce

please see above

What have you tried to solve it?

please see check 1-5

Environment

please see above

paste outputs here

Nice object detection video shown up for the whole length of video and can see clearly bounding boxes on each person, car, bicycle etc. but run at about 0.5 fps

github-actions[bot] commented 4 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

szha commented 4 years ago

cc @zhreshold

zhreshold commented 4 years ago

@moseswmwong it's very likely the matplotlib plot is the bottleneck in the video loop, try disable the vis part of the code and see if it speed up?

dai-ichiro commented 4 years ago

net.parameters and rgb_nd are on GPU ?

net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_voc', pretrained=True)
net.collect_params().reset_ctx(mx.gpu())
class_IDs, scores, bounding_boxes = net(rgb_nd.as_in_context(mx.gpu()))
moseswmwong commented 4 years ago

Fixed, thanks!

After using mx.gpu() context method the program runs really fast on GPU.