PINTO0309 / MobileNet-SSD-RealSense

[High Performance / MAX 30 FPS] RaspberryPi3(RaspberryPi/Raspbian Stretch) or Ubuntu + Multi Neural Compute Stick(NCS/NCS2) + RealSense D435(or USB Camera or PiCamera) + MobileNet-SSD(MobileNetSSD) + Background Multi-transparent(Simple multi-class segmentation) + FaceDetection + MultiGraph + MultiProcessing + MultiClustering
https://qiita.com/PINTO
MIT License
366 stars 127 forks source link

GPU does not have a better speed #29

Open duocang opened 5 years ago

duocang commented 5 years ago

[Required] Your device: Surface book 2 15 inch

[Required] Your device's CPU architecture: i7 8650U

[Required] Your OS: Windows 10 [Required] Details of the work you did before the problem occurred:

I was doing a benchmarking between CPU, GPU, and NCS2. I found the speed: CPU > MYRAID > GPU. It seems the result is not reasonable.

        if device == "CPU":
            model_xml = 'model_ir/fp32/frozen_alexnet_model.xml'
            model_bin = os.path.splitext(model_xml)[0] + ".bin"
            plugin = IEPlugin(device='CPU') # 使用CPU运行模型,要用NCS的话改为MYRIAD
            net = IENetwork(model=model_xml, weights=model_bin)  # 用NCS的话也需要更改
        elif device == "GPU":
            model_xml = 'model_ir/fp32/frozen_alexnet_model.xml'
            model_bin = os.path.splitext(model_xml)[0] + ".bin"
            plugin = IEPlugin(device='GPU')
            net = IENetwork(model=model_xml, weights=model_bin)
        else:
            model_xml = 'model_ir/fp16/frozen_alexnet_model.xml'
            model_bin = os.path.splitext(model_xml)[0] + ".bin"
            plugin = IEPlugin(device='MYRIAD')
            net = IENetwork(model=model_xml, weights=model_bin)

        net.batch_size = batch
        input_blob = next(iter(net.inputs))
        exec_net = plugin.load(network=net)

        print('tick tok...')
        start_time = time.time()

        for images, images_path in get_batches_fn(batch):
            outputs = exec_net.infer(inputs={input_blob: images})

            for indx in range(batch):
                class_name = class_names[np.argmax(outputs['prob'][indx])]
                probs = outputs['prob'][indx, np.argmax(outputs['prob'][indx])]

[Required] Overview of problems and questions:
Why GPU does not have a better performance? Is it normal?

PINTO0309 commented 5 years ago

Why GPU does not have a better performance? Is it normal?

Yes. The same results were obtained with my past benchmarks. The more CPU cores, the better the performance. Since the device infers with full power, CPU utilization will be close to 100%.