PaddlePaddle / Paddle-Lite

PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)
https://www.paddlepaddle.org.cn/lite
Apache License 2.0
6.92k stars 1.61k forks source link

请问Paddle Lite什么时候支持RK3399Pro? #5863

Closed github4529 closed 7 months ago

github4529 commented 3 years ago

我看官方文档中NPU加速,已支持的芯片 RK1808/1806 RV1126/1109 注意:暂时不支持RK3399Pro

RK1808有支持,却不支持rk3399pro,这两款芯片是同款npu,就是计算方式不一样。性能还是rk3399pro更好。

请问什么时候支持rk3399pro,或者有没有支持的计划?

paddle-bot-old[bot] commented 3 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网文档常见问题历史Issue来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQ and Github Issue to get the answer.Have a nice day!

hong19860320 commented 3 years ago

去年我们就向瑞芯微提过需求哦,他们答复支持RK3399Pro需要进行底层代码的重构,其工作量巨大。因此,在该款芯片上暂不支持PaddleLite+rknpu_ddk的调用方式哈~

github4529 commented 3 years ago

如果使用GPU的话,是用opencl吗,有没有ubuntu下的示例

zhupengyang commented 3 years ago

如果使用GPU的话,是用opencl吗,有没有ubuntu下的示例

可以跑的;如果rk3399上是安卓系统,那与使用手机完全一样;如果OS是linux,编译命令:./lite/tools/build_linux.sh --with_opencl=ON ,相关链接:https://paddle-lite.readthedocs.io/zh/develop/source_compile/compile_linux.html

github4529 commented 3 years ago

我编译opencl版本跟cpu版本的效率怎么差不多 跑的示例是paddle-lite-demo

我看download_models_and_libs.sh里面下载的model名字命名是xxx_pascalvoc_for_cpu,是nb文件也要生成为for opencl的吗

zhupengyang commented 3 years ago

我编译opencl版本跟cpu版本的效率怎么差不多 跑的示例是paddle-lite-demo

我看download_models_and_libs.sh里面下载的model名字命名是xxx_pascalvoc_for_cpu,是nb文件也要生成为for opencl的吗

是的。需要自行进行opt模型转换。

github4529 commented 3 years ago
  bool is_opencl_backend_valid =
      ::IsOpenCLBackendValid(/*check_fp16_valid = false*/);
  std::cout << "is_opencl_backend_valid:" << is_opencl_backend_valid
            << std::endl;
  //  Uncomment code below to enable OpenCL
  /*
  if (is_opencl_backend_valid) {
    // Set opencl kernel binary.
    // Large addtitional prepare time is cost due to algorithm selecting and
    // building kernel from source code.
    // Prepare time can be reduced dramitically after building algorithm file
    // and OpenCL kernel binary on the first running.
    // The 1st running time will be a bit longer due to the compiling time if
    // you don't call `set_opencl binary_path_name` explicitly.
    // So call `set_opencl binary_path_name` explicitly is strongly recommended.
    // Make sure you have write permission of the binary path.
    // We strongly recommend each model has a unique binary name.
    const std::string bin_path = "/data/local/tmp/";
    const std::string bin_name = "lite_opencl_kernel.bin";
    config.set_opencl_binary_path_name(bin_path, bin_name);
    // opencl tune option
    // CL_TUNE_NONE: 0
    // CL_TUNE_RAPID: 1
    // CL_TUNE_NORMAL: 2
    // CL_TUNE_EXHAUSTIVE: 3
    const std::string tuned_path = "/data/local/tmp/";
    const std::string tuned_name = "lite_opencl_tuned.bin";
    config.set_opencl_tune(CL_TUNE_NORMAL, tuned_path, tuned_name);
    // opencl precision option
    // CL_PRECISION_AUTO: 0, first fp16 if valid, default
    // CL_PRECISION_FP32: 1, force fp32
    // CL_PRECISION_FP16: 2, force fp16
    config.set_opencl_precision(CL_PRECISION_FP16);
  } else {
    std::cout << "Unsupport opencl nb model." << std::endl;
    exit(1);
    // you can give backup cpu nb model instead
    // config.set_model_from_file(cpu_nb_model_dir);
  }
  */
github4529 commented 3 years ago

请问Paddle-Lite里面用opencl预测, bin_path和tuned_path是干嘛用的 lite_opencl_kernel.bin 和 lite_opencl_tuned.bin

还有使用paddle-lite-demo,cpu预测140ms,opencl预测400ms rk3399pro

github4529 commented 3 years ago

firefly@firefly:~/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo$ sudo ./run.sh -- The C compiler identification is GNU 7.5.0 -- The CXX compiler identification is GNU 7.5.0 -- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc -- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ -- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- TARGET ARCH ABI: armv8 -- PADDLE LITE DIR: ../Paddle-Lite -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5")
-- Found OpenMP 4.5 -- OpenMP C flags: -fopenmp -- OpenMP CXX flags: -fopenmp -- OpenMP OpenMP_CXX_LIB_NAMES: gomp;pthread -- OpenMP OpenMP_CXX_LIBRARIES: /usr/lib/gcc/aarch64-linux-gnu/7/libgomp.so;/usr/lib/aarch64-linux-gnu/libpthread.so -- Found OpenCV: /usr/local (found version "3.4.11") -- OpenCV library status: -- version: 3.4.11 -- libraries: opencv_calib3d;opencv_core;opencv_dnn;opencv_features2d;opencv_flann;opencv_highgui;opencv_imgcodecs;opencv_imgproc;opencv_ml;opencv_objdetect;opencv_photo;opencv_shape;opencv_stitching;opencv_superres;opencv_video;opencv_videoio;opencv_videostab -- include path: /usr/local/include;/usr/local/include/opencv -- Configuring done -- Generating done -- Build files have been written to: /home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/build Scanning dependencies of target object_detection_demo [ 50%] Building CXX object CMakeFiles/object_detection_demo.dir/object_detection_demo.cc.o /home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/object_detection_demo.cc: In function ‘cv::Mat process(cv::Mat&, std::vector<std::__cxx11::basic_string >&, std::shared_ptr&)’: /home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/object_detection_demo.cc:234:41: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector::size_type {aka long unsigned int}’ [-Wformat=] printf("results: %d\n", results.size());


[100%] Linking CXX executable object_detection_demo
[100%] Built target object_detection_demo
Rga built version:06fc7c4 
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1097 Setup] ARM multiprocessors name: 
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1098 Setup] ARM multiprocessors number: 6
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 0, max freq: 1416, min freq: 1416, cluster ID: 1, CPU ARCH: A53
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 1, max freq: 1416, min freq: 1416, cluster ID: 1, CPU ARCH: A53
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 2, max freq: 1416, min freq: 1416, cluster ID: 1, CPU ARCH: A53
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 3, max freq: 1416, min freq: 1416, cluster ID: 1, CPU ARCH: A53
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 4, max freq: 1800, min freq: 1800, cluster ID: 0, CPU ARCH: A72
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1100 Setup] ARM multiprocessors ID: 5, max freq: 1800, min freq: 1800, cluster ID: 0, CPU ARCH: A72
[I  4/12  3:41:45.848 ...fly/Paddle-Lite/lite/core/device_info.cc:1106 Setup] L1 DataCache size is: 
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1108 Setup] 32 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1110 Setup] L2 Cache size is: 
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1112 Setup] 512 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1114 Setup] L3 Cache size is: 
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.849 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.850 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.850 ...fly/Paddle-Lite/lite/core/device_info.cc:1116 Setup] 0 KB
[I  4/12  3:41:45.850 ...fly/Paddle-Lite/lite/core/device_info.cc:1118 Setup] Total memory: 2004016KB
[I  4/12  3:41:45.850 ...fly/Paddle-Lite/lite/core/device_info.cc:917 RequestPowerHighMode] Request thread num: 6, exceed the big cores size: 2, truncate thread num to 2
[W  4/12  3:41:45.851 ...e-Lite/lite/model_parser/model_parser.cc:799 LoadModelFbsFromFile] warning: the version of opt that transformed this model is not consistent with current Paddle-Lite version.
      version of opt:v2.8-rc
      version of current Paddle-Lite:6d6a6c7
[I  4/12  3:41:45.909 ...-Lite/lite/backends/opencl/cl_runtime.cc:65 Init] opencl_lib_found:1
[I  4/12  3:41:45.909 ...-Lite/lite/backends/opencl/cl_runtime.cc:73 Init] dlsym_success:1
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:533 InitializePlatform] Platform extension: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_arm_core_id cl_arm_printf cl_arm_thread_limit_hint cl_arm_non_uniform_work_group_size cl_arm_import_memory
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:81 Init] is_platform_init:1
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:615 InitializeDevice] Using device: Mali-T860
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:641 InitializeDevice] CL_DEVICE_VERSION:OpenCL 1.2 v1.r18p0-01rel0.b3168fd4917d4853d85f2a426b70bb36
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:648 InitializeDevice] device_type:GPU
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:652 InitializeDevice] The chosen device has 4 compute units.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:656 InitializeDevice] CL_DEVICE_MAX_CLOCK_FREQUENCY:5
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:666 InitializeDevice] The local memory size of the chosen device is 32.000000 KB.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:673 InitializeDevice] CL_DEVICE_GLOBAL_MEM_CACHE_SIZE(KB):256.000000 KB.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:681 InitializeDevice] CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE(KB):0.062500 KB.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:688 InitializeDevice] CL_DEVICE_GLOBAL_MEM_SIZE(KB):2002800.000000 KB.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:696 InitializeDevice] CL_DEVICE_MAX_WORK_GROUP_SIZE:256
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:700 InitializeDevice] CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:3
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:705 InitializeDevice] max_work_item_sizes[0]:256
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:705 InitializeDevice] max_work_item_sizes[1]:256
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:705 InitializeDevice] max_work_item_sizes[2]:256
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:716 InitializeDevice] CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:64.000000
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:727 InitializeDevice] The chosen device supports image processing.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:731 InitializeDevice] CL_DEVICE_IMAGE2D_MAX_HEIGHT:65536
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:735 InitializeDevice] CL_DEVICE_IMAGE2D_MAX_WIDTH:65536
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:749 InitializeDevice] The chosen device supports the half data type.
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:757 InitializeDevice] CL_DEVICE_ADDRESS_BITS:64
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:761 InitializeDevice] CL_DRIVER_VERSION:1.2
[I  4/12  3:41:45.911 ...-Lite/lite/backends/opencl/cl_runtime.cc:89 Init] is_device_init:1
[I  4/12  3:41:45.912 ...-Lite/lite/backends/opencl/cl_runtime.cc:101 Init] set is_cl_runtime_initialized_ = true
[I  4/12  3:41:45.916 ...refly/Paddle-Lite/lite/api/paddle_api.cc:50 IsOpenCLBackendValid] opencl_lib_found:1
[I  4/12  3:41:45.916 ...refly/Paddle-Lite/lite/api/paddle_api.cc:56 IsOpenCLBackendValid] dlsym_success:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:63 IsOpenCLBackendValid] opencl_valid:1
arm_release_ver of this libmali is 'r18p0-01rel0', rk_so_ver is '2'.is_opencl_backend_valid:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:50 IsOpenCLBackendValid] opencl_lib_found:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:56 IsOpenCLBackendValid] dlsym_success:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:63 IsOpenCLBackendValid] opencl_valid:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:278 set_opencl_binary_path_name] opencl binary path and file name:/home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_kernel.bin
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:50 IsOpenCLBackendValid] opencl_lib_found:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:56 IsOpenCLBackendValid] dlsym_success:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:63 IsOpenCLBackendValid] opencl_valid:1
[I  4/12  3:41:45.917 ...-Lite/lite/backends/opencl/cl_runtime.cc:842 set_auto_tune] tuned_file.size():99, tuned_file:/home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_tuned.bin
[I  4/12  3:41:45.917 ...-Lite/lite/backends/opencl/cl_runtime.cc:846 set_auto_tune] Load tuned file: /home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_tuned.bin
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:296 set_opencl_tune] set opencl_tune_mode: CL_TUNE_NORMAL, lws_repeats:4
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:299 set_opencl_tune] tuned file path & name:/home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_tuned.bin
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:50 IsOpenCLBackendValid] opencl_lib_found:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:56 IsOpenCLBackendValid] dlsym_success:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:63 IsOpenCLBackendValid] opencl_valid:1
[I  4/12  3:41:45.917 ...refly/Paddle-Lite/lite/api/paddle_api.cc:311 set_opencl_precision] set opencl precision: CL_PRECISION_FP16
[I  4/12  3:41:45.935 ...-Lite/lite/backends/opencl/cl_runtime.cc:218 CheckFromPrecompiledBinary] Load opencl kernel bin file: /home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_kernel.bin
iter 0 cost: 451.321014 ms
iter 1 cost: 472.569000 ms
iter 2 cost: 338.707001 ms
iter 3 cost: 338.989014 ms
iter 4 cost: 339.759003 ms
warmup: 1 repeat: 5, average: 388.269006 ms, max: 472.569000 ms, min: 338.707001 ms
results: 3
[0] bicycle - 0.991699 0.133789,0.233643,0.734375,0.771484
[1] car - 0.969727 0.612305,0.138062,0.900391,0.293945
[2] dog - 0.986328 0.156860,0.331299,0.443848,0.919922
Preprocess time: 5.144000 ms
Prediction time: 388.269006 ms
Postprocess time: 0.344000 ms

[I  4/12  3:41:51.935 ...e-Lite/lite/backends/opencl/cl_context.h:43 ~CLContext] release cl::Program, cl::Kernel finished.
[I  4/12  3:41:51.941 ...-Lite/lite/backends/opencl/cl_runtime.cc:378 SaveProgram] OpenCL Program existed:/home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_kernel.bin
[I  4/12  3:41:51.941 ...-Lite/lite/backends/opencl/cl_runtime.cc:403 SaveTuned] OpenCL Tuned file existed:/home/firefly/Paddle-Lite-Demo/PaddleLite-armlinux-demo/object_detection_demo/lite_opencl_tuned.bin
[I  4/12  3:41:51.941 ...-Lite/lite/backends/opencl/cl_runtime.cc:37 ~CLRuntime] is_cl_runtime_initialized_:1
Reatris commented 3 years ago

我编译opencl版本跟cpu版本的效率怎么差不多 跑的示例是paddle-lite-demo 我看download_models_and_libs.sh里面下载的model名字命名是xxx_pascalvoc_for_cpu,是nb文件也要生成为for opencl的吗

是的。需要自行进行opt模型转换。

你好我想请教一下,看到的openCL案例都是基于c++的,我那个是用python写的lite推理的可以使用openCL吗 @zhupengyang