Inference run time (Demo image)

xjoshramos commented 6 years ago

I am not able to reach 5 fps inference time for demo image running on a titan Volta.

INFO infer_simple.py: 111: Inference time: 1.306s INFO infer_simple.py: 113: | im_detect_bbox: 1.175s INFO infer_simple.py: 113: | im_detect_body_uv: 0.131s INFO infer_simple.py: 113: | misc_bbox: 0.001s

Section 3.2 report 4-5 fps on a 1080: _

"During inference, our system operates at 25fps on 320x240 images and 4-5fps on 800x1100 images using a GTX1080 graphics card."

_

Is this timing for the full pipeline or just the im_detect_body_uv?

Here is my configuration for caffe2 / pytorch. -- Does not need to define long separately. -- std::exception_ptr is supported. -- NUMA is available -- Current compiler supports avx2 extention. Will build perfkernels. -- Building using own protobuf under third_party per request. -- Use custom protobuf build. -- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/home/joshua/pytorch/third_party/protobuf/src>$ -- The BLAS backend of choice:Eigen -- Brace yourself, we are building NNPACK -- Found PythonInterp: /usr/bin/python (found version "2.7.12") -- Caffe2: Cannot find gflags automatically. Using legacy find. -- Caffe2: Found gflags (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so) -- Caffe2: Cannot find glog automatically. Using legacy find. -- Caffe2: Found glog (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so) -- Found lmdb (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so) -- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so) -- Found Snappy (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so) -- Found Numa (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnuma.so) -- OpenCV found (/opt/ros/kinetic/share/OpenCV-3.3.1) -- Found system Eigen at /usr/local/include/eigen3 -- Found PythonInterp: /usr/bin/python (found suitable version "2.7.12", minimum required is "2.7") -- NumPy ver. 1.14.0 found (include: /usr/local/lib/python2.7/dist-packages/numpy/core/include) -- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) -- MPI support found -- MPI compile flags: -- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi -- MPI LINK flags path: -Wl,-rpath -Wl,/usr/lib/openmpi/lib -Wl,--enable-new-dtags -- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so CMake Warning at cmake/Dependencies.cmake:376 (message): OpenMPI found, but it is not built with CUDA support. Call Stack (most recent call first): CMakeLists.txt:182 (include)

-- Caffe2: CUDA detected: 9.0 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr/local/cuda -- Caffe2: Header version is: 9.0 -- Found cuDNN: v7.0.5 (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so) -- Autodetected CUDA architecture(s): 7.0 -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70 -- Determining NCCL version from the header file: /usr/include/nccl.h -- NCCL_MAJOR_VERSION: 2 -- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl.so) -- Could NOT find CUB (missing: CUB_INCLUDE_DIR) -- Could NOT find Gloo (missing: Gloo_INCLUDE_DIR Gloo_LIBRARY) -- MPI include path: /usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include/usr/lib/openmpi/include/usr/lib/openmpi/include/openmpi -- MPI libraries: /usr/lib/openmpi/lib/libmpi_cxx.so/usr/lib/openmpi/lib/libmpi.so -- CUDA detected: 9.0 -- Found libcuda: /usr/local/cuda/lib64/stubs/libcuda.so -- Found libnvrtc: /usr/local/cuda/lib64/libnvrtc.so -- Determining NCCL version from the header file: /usr/include/nccl.h -- NCCL_MAJOR_VERSION: 2 -- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl.so) -- GCC 5.4.1: Adding gcc and gcc_s libs to link line -- Include NCCL operators -- Excluding ideep operators as we are not using ideep -- Including image processing operators -- Excluding video processing operators due to no opencv -- Excluding mkl operators as we are not using mkl -- Include Observer library -- Using lib/python2.7/dist-packages as python relative installation path -- Automatically generating missing init.py files. -- A previous caffe2 cmake run already created the init.py files. CMake Warning at CMakeLists.txt:338 (message): Generated cmake files are only fully tested if one builds with system glog, gflags, and protobuf. Other settings may generate files that are not well tested.

-- -- Summary -- General: -- CMake version : 3.5.2 -- CMake command : /usr/bin/cmake -- Git version : v0.1.11-9036-g709c300-dirty -- System : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 5.4.1 -- BLAS : Eigen -- CXX flags : -fvisibility-inlines-hidden -DONNX_NAMESPACE=onnx_c2 -O2 -fPIC -Wno-narrowing -Wno-invalid-partial-specialization -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-private-field -Wno-unused-result -Wno-inconsistent-missing-override -Wno-aligned-allocation-unavailable -Wno-error=deprecated-declarations -- Build type : Release -- Compile definitions : -- CMAKE_PREFIX_PATH : -- CMAKE_INSTALL_PREFIX : /usr/local -- -- BUILD_CAFFE2 : ON -- BUILD_ATEN : OFF -- BUILD_BINARY : ON -- BUILD_CUSTOM_PROTOBUF : ON -- Link local protobuf : ON -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 2.7.12 -- Python includes : /usr/include/python2.7 -- BUILD_SHARED_LIBS : ON -- BUILD_TEST : OFF -- USE_ASAN : OFF -- USE_ATEN : OFF -- USE_CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- CUDA version : 9.0 -- cuDNN version : 7.0.5 -- CUDA root directory : /usr/local/cuda -- CUDA library : /usr/local/cuda/lib64/stubs/libcuda.so -- cudart library : /usr/local/cuda/lib64/libcudart_static.a;-pthread;dl;/usr/lib/x86_64-linux-gnu/librt.so -- cublas library : /usr/local/cuda/lib64/libcublas.so;/usr/local/cuda/lib64/libcublas_device.a -- cufft library : /usr/local/cuda/lib64/libcufft.so -- curand library : /usr/local/cuda/lib64/libcurand.so -- cuDNN library : /usr/local/cuda/lib64/libcudnn.so -- nvrtc : /usr/local/cuda/lib64/libnvrtc.so -- CUDA include path : /usr/local/cuda/include -- NVCC executable : /usr/local/cuda/bin/nvcc -- CUDA host compiler : /usr/bin/cc -- USE_TENSORRT : OFF -- USE_ROCM : OFF -- USE_EIGEN_FOR_BLAS : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : ON -- USE_GLOG : ON -- USE_GLOO : ON -- USE_GLOO_IBVERBS : OFF -- USE_LEVELDB : ON -- LevelDB version : 1.18 -- Snappy version : 1.1.3 -- USE_LITE_PROTO : OFF -- USE_LMDB : ON -- LMDB version : 0.9.17 -- USE_METAL : OFF -- USE_MKL : -- USE_MOBILE_OPENGL : OFF -- USE_MPI : ON -- USE_NCCL : ON -- USE_SYSTEM_NCCL : OFF -- USE_NERVANA_GPU : OFF -- USE_NNPACK : ON -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : ON -- OpenCV version : 3.3.1 -- USE_OPENMP : OFF -- USE_PROF : OFF -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_ZMQ : OFF -- Public Dependencies : Threads::Threads;gflags;glog::glog -- Private Dependencies : nnpack;cpuinfo;/usr/lib/x86_64-linux-gnu/liblmdb.so;/usr/lib/x86_64-linux-gnu/libleveldb.so;/usr/lib/x86_64-linux-gnu/libsnappy.so;/usr/lib/x86_64-linux-gnu/libnuma.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_videoio;opencv_video;/usr/lib/openmpi/lib/libmpi_cxx.so;/usr/lib/openmpi/lib/libmpi.so;gloo;gcc_s;gcc;dl

Tetsujinfr commented 6 years ago

This is the first time I read an alternative result on the infetence of this model. Would be great to have an update from the team on the above question, even a quick yes/no or looking at it. I am surprised I could not fine any alternative demo results (image or video post inference) excepts those from the publication. Does it mean no-one is able to make the code work end-to-end so far? (I am still at the compiling stage given that the precompiled binaries fid not work for me)

ralpguler commented 6 years ago

As printed in infer_simple.py L116, the first image is generally slower with respect to the rest, can you try a folder if you are trying out a single image? There might be other factors that affect the timings, examples would be the 'SCALE' and 'MAX_SIZE' in the config, the number of detected persons in the image (for im_detect_body_uv).

facebookresearch / DensePose

Inference run time (Demo image) #35