issue with darknet-nnpack python library

ligc commented 6 years ago

HI,

I am trying to run detection on multiple images (currently static images, will be image snapshots from a video in the future) on my Raspberry PI 3, the CLI works great and I get the same result 1.3s/image as claimed, but because the CLI needs to load the network and metadata every time, it slows down the detection, so I wrote the following python script to automate the images detection without needing to load the network and metadata every time:

#!/usr/bin/python

import sys
import os.path

sys.path.append("python")

import darknet

yolo_net = darknet.load_net("cfg/tiny-yolo-voc.cfg",
                    "tiny-yolo-voc.weights",0)
yolo_meta = darknet.load_meta("cfg/voc.data")

list = os.listdir("data")
for i in range(0,len(list)):
    imagename = os.path.join("data",list[i])
    if os.path.isfile(imagename):
        if os.path.splitext(imagename)[1] == ".jpg":
            print "Detecting image %s" % imagename
            res = darknet.detect(yolo_net, yolo_meta, imagename)
            print res

I put the python script under darknet-nnpack directory, and update the darknet-nnpack/python/darknet.py with the following line:

lib = CDLL("/root/darknet-nnpack/libdarknet.so", RTLD_GLOBAL)

The python script runs correctly with the upstream darknet(https://github.com/pjreddie/darknet), but as expected, it runs terribly slow(40s+/image) and does not work for my scenario.

The python script could not be run with the darknet-nnpack, with the following error:

root@rpi-3:~/darknet-nnpack# ./test.py
Traceback (most recent call last):
  File "./test.py", line 8, in <module>
    import darknet
  File "python/darknet.py", line 37, in <module>
    lib = CDLL("/root/darknet-nnpack/libdarknet.so", RTLD_GLOBAL)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /root/darknet-nnpack/libdarknet.so: undefined symbol: nnp_convolution_inference
root@rpi-3:~/darknet-nnpack#

I feel that some compile options will be needed to static link the nnpack library into libdarknet.so, or some environment variable to point to the NNPACK-darknet, but I was not able to figure it out.

Any suggestion on how I could move forward will be highly appreciated.

digitalbrain79 commented 6 years ago

I made python module for darknet-nnpack. Refer to https://github.com/digitalbrain79/pyyolo

ligc commented 6 years ago

@digitalbrain79, thanks for the response. I tried the pyyolo, here are my findings:

1. GPU=0 CUDNN=0: 38s/image

I changed GPU=0 CUDNN=0 in Makefile, since I am on a Raspberry PI 3 without GPU

The test result is 38s/image.

root@rpi-3:~/pyyolo# python example.py
False
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv    425  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 425
   15 detection
mask_scale: Using default '1.000000'
Loading weights from ../tiny-yolo.weights...Done!
----- test original C using a file
./darknet/data/person.jpg: Predicted in 38.268967 seconds.
{'right': 271, 'bottom': 336, 'top': 108, 'class': 'person', 'prob': 0.5661062002182007, 'left': 185}
{'right': 188, 'bottom': 349, 'top': 266, 'class': 'dog', 'prob': 0.8325474858283997, 'left': 79}
{'right': 592, 'bottom': 344, 'top': 107, 'class': 'horse', 'prob': 0.763496458530426, 'left': 411}
----- test python API using a file
Cam frame predicted in 38.106396 seconds.
{'right': 271, 'bottom': 336, 'top': 108, 'class': 'person', 'prob': 0.5692979693412781, 'left': 185}
{'right': 188, 'bottom': 349, 'top': 266, 'class': 'dog', 'prob': 0.8323339223861694, 'left': 79}
{'right': 592, 'bottom': 344, 'top': 107, 'class': 'horse', 'prob': 0.7599188685417175, 'left': 411}
root@rpi-3:~/pyyolo#

2. NNPACK=1 and ARM_NEON=1: 16s/image

I added the NNPACK and ARM_NEON logic from darknet-nnpack/Makefile to pyyolo/Makfile, as shown below:

NNPACK=1
ARM_NEON=1
...
...
ifeq ($(NNPACK), 1)
COMMON+= -DNNPACK
CFLAGS+= -DNNPACK
LDFLAGS+= -lnnpack -lpthreadpool
endif

ifeq ($(ARM_NEON), 1)
COMMON+= -DARM_NEON
CFLAGS+= -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize
endif

Rebuilt pyyolo with the updated Makefile, reran the test, the test result is 16s/image

root@rpi-3:~/pyyolo# python example.py
False
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv    425  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 425
   15 detection
mask_scale: Using default '1.000000'
Loading weights from ../tiny-yolo.weights...Done!
----- test original C using a file
./darknet/data/person.jpg: Predicted in 16.083889 seconds.
{'right': 271, 'bottom': 336, 'top': 108, 'class': 'person', 'prob': 0.5661063194274902, 'left': 185}
{'right': 188, 'bottom': 349, 'top': 266, 'class': 'dog', 'prob': 0.8325475454330444, 'left': 79}
{'right': 592, 'bottom': 344, 'top': 107, 'class': 'horse', 'prob': 0.7634970545768738, 'left': 411}
----- test python API using a file
Cam frame predicted in 15.980301 seconds.
{'right': 271, 'bottom': 336, 'top': 108, 'class': 'person', 'prob': 0.5692971348762512, 'left': 185}
{'right': 188, 'bottom': 349, 'top': 266, 'class': 'dog', 'prob': 0.8323339223861694, 'left': 79}
{'right': 592, 'bottom': 344, 'top': 107, 'class': 'horse', 'prob': 0.7599186301231384, 'left': 411}
root@rpi-3:~/pyyolo#

Any advice on how could I achieve the 1.3s/image as darknet-nnpack?

tahouse commented 6 years ago

I'm also very curious about how to use the CDLL functionality of the original libdarknet.so but with added NNPACK optimization. One option that seems reasonable is to use the latest upstream darknet commit and update it with NNPACK calls.

I also a tried adding in LDFLAGS to the Makefile for darknet-nnpack so that the library libdarknet.so would be built, but I get the following failure when running make: gcc -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DNNPACK -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -shared obj/gemm.o obj/utils.o obj/cuda.o obj/deconvolutional_layer.o obj/convolutional_layer.o obj/list.o obj/image.o obj/activations.o obj/im2col.o obj/col2im.o obj/blas.o obj/crop_layer.o obj/dropout_layer.o obj/maxpool_layer.o obj/softmax_layer.o obj/data.o obj/matrix.o obj/network.o obj/connected_layer.o obj/cost_layer.o obj/parser.o obj/option_list.o obj/detection_layer.o obj/route_layer.o obj/box.o obj/normalization_layer.o obj/avgpool_layer.o obj/layer.o obj/local_layer.o obj/shortcut_layer.o obj/activation_layer.o obj/rnn_layer.o obj/gru_layer.o obj/crnn_layer.o obj/demo.o obj/batchnorm_layer.o obj/region_layer.o obj/reorg_layer.o obj/tree.o obj/lstm_layer.o -o libdarknet.so -lm -pthread `pkg-config --libs opencv` -lnnpack -lpthreadpool /usr/bin/ld: /usr/lib/gcc/arm-linux-gnueabihf/4.8/../../../libnnpack.a(softmax.c.o): relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be used when making a shared object; recompile with -fPIC /usr/lib/gcc/arm-linux-gnueabihf/4.8/../../../libnnpack.a: error adding symbols: Bad value collect2: error: ld returned 1 exit status Makefile:96: recipe for target 'libdarknet.so' failed make: *** [libdarknet.so] Error 1

I will poke around a little with the error. Your suggestion about a static link makes since considering the error I'm seeing: https://stackoverflow.com/questions/13812185/how-to-recompile-with-fpic I'm trying to recompile NNPACK-darknet with an added -fPIC flag in the build.ninja file (prior to running ninja) see: https://github.com/Maratyszcza/NNPACK/issues/5

My current Makefile is:

GPU=0
CUDNN=0
OPENCV=1
NNPACK=1
ARM_NEON=1
OPENMP=0
DEBUG=0

ARCH= -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52]
#      -gencode arch=compute_20,code=[sm_20,sm_21] \ This one is deprecated?

# This is what I use, uncomment if you know your arch and want to specify
# ARCH= -gencode arch=compute_52,code=compute_52

VPATH=./src/:./examples
SLIB=libdarknet.so
ALIB=libdarknet.a
EXEC=darknet
OBJDIR=./obj/

CC=gcc
NVCC=nvcc 
AR=ar
ARFLAGS=rcs
OPTS=-Ofast
LDFLAGS= -lm -pthread 
COMMON= -Iinclude/ -Isrc/
CFLAGS=-Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC

ifeq ($(OPENMP), 1) 
CFLAGS+= -fopenmp
endif

ifeq ($(DEBUG), 1) 
OPTS=-O0 -g
endif

CFLAGS+=$(OPTS)

ifeq ($(OPENCV), 1) 
COMMON+= -DOPENCV
CFLAGS+= -DOPENCV
LDFLAGS+= `pkg-config --libs opencv` 
COMMON+= `pkg-config --cflags opencv` 
endif

ifeq ($(GPU), 1) 
COMMON+= -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
endif

ifeq ($(CUDNN), 1) 
COMMON+= -DCUDNN 
CFLAGS+= -DCUDNN
LDFLAGS+= -lcudnn
endif

ifeq ($(NNPACK), 1)
COMMON+= -DNNPACK
CFLAGS+= -DNNPACK
LDFLAGS+= -lnnpack -lpthreadpool
endif

ifeq ($(ARM_NEON), 1)
COMMON+= -DARM_NEON
CFLAGS+= -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize
endif

OBJ=gemm.o utils.o cuda.o deconvolutional_layer.o convolutional_layer.o list.o image.o activations.o im2col.o col2im.o blas.o crop_layer.o dropout_layer.o maxpool_layer.o softmax_layer.o data.o matrix.o network.o connected_layer.o cost_layer.o parser.o option_list.o detection_layer.o route_layer.o box.o normalization_layer.o avgpool_layer.o layer.o local_layer.o shortcut_layer.o activation_layer.o rnn_layer.o gru_layer.o crnn_layer.o demo.o batchnorm_layer.o region_layer.o reorg_layer.o tree.o  lstm_layer.o
EXECOBJA=captcha.o lsd.o super.o art.o tag.o cifar.o go.o rnn.o segmenter.o regressor.o classifier.o coco.o yolo.o detector.o nightmare.o attention.o darknet.o

ifeq ($(GPU), 1) 
LDFLAGS+= -lstdc++ 
OBJ+=convolutional_kernels.o deconvolutional_kernels.o activation_kernels.o im2col_kernels.o col2im_kernels.o blas_kernels.o crop_layer_kernels.o dropout_layer_kernels.o maxpool_layer_kernels.o avgpool_layer_kernels.o
endif

EXECOBJ = $(addprefix $(OBJDIR), $(EXECOBJA))
OBJS = $(addprefix $(OBJDIR), $(OBJ))
DEPS = $(wildcard src/*.h) Makefile include/darknet.h

#all: obj backup results $(SLIB) $(ALIB) $(EXEC)
all: obj  results $(SLIB) $(ALIB) $(EXEC)

$(EXEC): $(EXECOBJ) $(ALIB)
    $(CC) $(COMMON) $(CFLAGS) $^ -o $@ $(LDFLAGS) $(ALIB)

$(ALIB): $(OBJS)
    $(AR) $(ARFLAGS) $@ $^

$(SLIB): $(OBJS)
    $(CC) $(CFLAGS) -shared $^ -o $@ $(LDFLAGS)

$(OBJDIR)%.o: %.c $(DEPS)
    $(CC) $(COMMON) $(CFLAGS) -c $< -o $@

$(OBJDIR)%.o: %.cu $(DEPS)
    $(NVCC) $(ARCH) $(COMMON) --compiler-options "$(CFLAGS)" -c $< -o $@

obj:
    mkdir -p obj
backup:
    mkdir -p backup
results:
    mkdir -p results

.PHONY: clean

clean:
    rm -rf $(OBJS) $(SLIB) $(ALIB) $(EXEC) $(EXECOBJ)

Beyond that, please update if you figure out how to get fast (1 frame/sec) detection through a python interface (pyyolo) or are able to get the library to build with nnpack.

==============

Update: adding the -fPIC flag to cflags within the NNPACK-darknet build.ninja options worked. Now when running the Makefile for darknet-nnpack, I get a shared library libdarknet.so that allows loading within Python. Detection times are around 600-900 ms. My issue now is that I'm not getting any detections as I would through the regular darknet call. All my arrays are returned empty (no detections!) Any ideas? I think I saw other people were having similar issues.

Let me know if you'd like more details.

ligc commented 6 years ago

@tahouse I got the same result with you, python could load libdarknet.so, but could not detect any object from the images. Here are what I did:

Modify the NNPACK-darknet/build.ninja to add the -fPIC with cflags and cxxflags, then rerun the build process with "ninja -t clean; ninja"
Modify darknet-nnpack/Makefile to link all the NNPACK-darknet object files to libdarknet.so.

NNPACKOBJS=../NNPACK-darknet/build/src/init.c.o ../NNPACK-darknet/build/src/convolution-output.c.o ../NNPACK-darknet/build/src/convolution-input-gradient.c.o ../NNPACK-darknet/build/src/convolution-kernel-gradient.c.o ../NNPACK-darknet/build/src/convolution-inference.c.o ../NNPACK-darknet/build/src/fully-connected-output.c.o ../NNPACK-darknet/build/src/fully-connected-inference.c.o ../NNPACK-darknet/build/src/pooling-output.c.o ../NNPACK-darknet/build/src/softmax-output.c.o ../NNPACK-darknet/build/src/relu-output.c.o ../NNPACK-darknet/build/src/relu-input-gradient.c.o ../NNPACK-darknet/build/src/psimd/2d-fourier-8x8.c.o ../NNPACK-darknet/build/src/psimd/2d-fourier-16x16.c.o ../NNPACK-darknet/build/src/psimd/2d-winograd-8x8-3x3.c.o ../NNPACK-darknet/build/src/psimd/relu.c.o ../NNPACK-darknet/build/src/psimd/softmax.c.o ../NNPACK-darknet/build/src/psimd/fft-block-mac.c.o ../NNPACK-darknet/build/src/neon/blas/s4gemm.c.o ../NNPACK-darknet/build/src/neon/blas/c4gemm.c.o ../NNPACK-darknet/build/src/neon/blas/s4c2gemm.c.o ../NNPACK-darknet/build/src/neon/blas/c4gemm-conjb.c.o ../NNPACK-darknet/build/src/neon/blas/s4c2gemm-conjb.c.o ../NNPACK-darknet/build/src/neon/blas/c4gemm-conjb-transc.c.o ../NNPACK-darknet/build/src/neon/blas/s4c2gemm-conjb-transc.c.o ../NNPACK-darknet/build/src/neon/blas/conv1x1.c.o ../NNPACK-darknet/build/src/neon/blas/sgemm.c.o ../NNPACK-darknet/build/src/neon/blas/sdotxf.c.o ../NNPACK-darknet/build/src/psimd/blas/shdotxf.c.o ../NNPACK-darknet/build/deps/pthreadpool/src/threadpool-pthreads.c.o

...
...

$(SLIB): $(OBJS) $(NNPACKOBJS)
        $(CC) $(CFLAGS) -shared $^ -o $@

Then the python could load the libdarknet.so successfully, but all the returned arrays are empty.

root@rpi-3:~/darknet-nnpack# cat test.py
#!/usr/bin/python

import sys
import os.path

sys.path.append("python")

import darknet

yolo_net = darknet.load_net("cfg/tiny-yolo-voc.cfg",
                    "tiny-yolo-voc.weights",0)
yolo_meta = darknet.load_meta("cfg/voc.data")

list = os.listdir("data")
for i in range(0,len(list)):
    imagename = os.path.join("data",list[i])
    if os.path.isfile(imagename):
        if os.path.splitext(imagename)[1] == ".jpg":
            print "Detecting image %s" % imagename
            res = darknet.detect(yolo_net, yolo_meta, imagename)
            print res
root@rpi-3:~/darknet-nnpack#

The result:

root@rpi-3:~/darknet-nnpack# ./test.py
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv   1024  3 x 3 / 1    13 x  13 x1024   ->    13 x  13 x1024
   14 conv    125  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 125
   15 detection
mask_scale: Using default '1.000000'
Loading weights from tiny-yolo-voc.weights...Done!
Detecting image data/dog.jpg
[]
Detecting image data/scream.jpg
[]
Detecting image data/giraffe.jpg
[]
Detecting image data/horses.jpg
[]
Detecting image data/eagle.jpg
[]
Detecting image data/person.jpg
[]
root@rpi-3:~/darknet-nnpack#

I also tried to make the compiling flags be the same for NNPACK-darknet and darknet-nnpack, but it did not help.

@digitalbrain79 Could you shed us some light on the directions? Thank you.

BogiHsu commented 6 years ago

Hi, everyone After doing some studies and modifying darknet.py(there is a function should be called if you are using nnpack-built darknet), the C library now can run in python. I follow the instruction mentioned by @ligc @tahouse above to the build darknet.so. I put the modified darknet.py in my project. Just run it under darknet-nnpack directory.

amwfarid commented 6 years ago

I haven't been very lucky with running the python library. So I opted for a quick workaround by modifying the source to include a TCP server (In C) which I can talk to using Python: https://github.com/amwfarid/DarkNet-NNPack-Python-Library

Mind you this is really meant as a workaround, not a full replacement until things are resolved.

ljh14 commented 6 years ago

您好！请问我运行 digitalbrain79 的pyyolo为何会报错？我也是在raspberry pi3b上面运行的，一编译python文件夹里面的darknet.py就报错，更不用说make了，是要改什么地方才能编译通过吗？

ljh14 commented 6 years ago

如果自己改darknet with nnpack的代码，用python封装它然后运行可以吗？

ljh14 commented 6 years ago

@BogiHsu It still doesn't work,the problem is still "undefined symbol: nnp_convolution_inference",how do you make it run successfully?

andresroyarce commented 6 years ago

Any solution for this? I got the same error: OSError: "..." /libdarknet.so: undefined symbol: nnp_convolution_inference And for the PIYOLO, this only works with the original darknet, not the NNPACK darknet.

hastou commented 5 years ago

Pull request #27 resolve "undefined symbol: nnp_convolution_inference" error. You only need to have libnnpack.a and libpthreadpool.a in your LD_LIBRARY_PATH environment variable. Alternatively you can modify Makefile to link statically to libdarknet.so

shartoo commented 5 years ago

Hi,@hastou, i tried to find libnnpack.a as below:

 sudo find / -name libnnpack.a 
/home/pi/workspace/NNPACK-darknet/lib/libnnpack.a
/usr/lib/libnnpack.a

 sudo find / -name libpthreadpool.a 
/home/pi/workspace/NNPACK-darknet/lib/libpthreadpool.a
/usr/lib/libpthreadpool.a

which means this two .a file should be within /usr/lib path,so i add this path to LD_LIBRARY_PATH in ~.profile as below

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib

and do souce

source ~/.profile

but error exists.Can you show some detail about how to do it?

ayman commented 5 years ago

Just an FYI: After trying everything here exhaustively (on a new RPi) I gave up and went with OpenCV 4 Beta’s yolo implementation which works like a charm in Py3. Yolo3-Lite averages about 1.4s/image and Yolo3 full averages around 14s/image (when complied with processor optimizations...which takes about 16 hours if you do it on device). I would have liked to use Darknet+NNPACK but I doubt it would get that much faster on a smaller device.

hastou commented 5 years ago

@shartoo if you have "undefined symbol error" try to clone the last version of the repository and follow the README tutorial. It should work.

shartoo commented 5 years ago

I did the following

pi@raspberrypi:~/workspace/darknet-nnpack $ find ~/workspace -name libnnpack.a
/home/pi/workspace/NNPACK-darknet/lib/libnnpack.a

pi@raspberrypi:~/workspace/darknet-nnpack $ sudo nano ~/.bashrc
...
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH://home/pi/workspace/NNPACK-darknet/lib

pi@raspberrypi:~/workspace/darknet-nnpack $source ~/.bashrc

pi@raspberrypi:~/workspace/darknet-nnpack $ echo $LD_LIBRARY_PATH
:/usr/local/lib:/usr/lib://home/pi/workspace/NNPACK-darknet/lib

pi@raspberrypi:~/workspace/darknet-nnpack $ make
gcc -Iinclude/ -Isrc/ -DNNPACK -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DNNPACK -c ./src/gemm.c -o obj/gemm.o
gcc -Iinclude/ -Isrc/ -DNNPACK -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DNNPACK -c ./src/utils.c -o obj/utils.o
./src/utils.c: In function ‘fgetl’:
...
gcc -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DNNPACK -shared obj/gemm.o obj/utils.o obj/cuda.o obj/deconvolutional_layer.o obj/convolutional_layer.o obj/list.o obj/image.o obj/activations.o obj/im2col.o obj/col2im.o obj/blas.o obj/crop_layer.o obj/dropout_layer.o obj/maxpool_layer.o obj/softmax_layer.o obj/data.o obj/matrix.o obj/network.o obj/connected_layer.o obj/cost_layer.o obj/parser.o obj/option_list.o obj/detection_layer.o obj/route_layer.o obj/upsample_layer.o obj/box.o obj/normalization_layer.o obj/avgpool_layer.o obj/layer.o obj/local_layer.o obj/shortcut_layer.o obj/logistic_layer.o obj/activation_layer.o obj/rnn_layer.o obj/gru_layer.o obj/crnn_layer.o obj/demo.o obj/batchnorm_layer.o obj/region_layer.o obj/reorg_layer.o obj/tree.o obj/lstm_layer.o obj/l2norm_layer.o obj/yolo_layer.o obj/iseg_layer.o obj/image_opencv.o -o libdarknet.so -lm -pthread  -lnnpack -lpthreadpool
/usr/bin/ld: /usr/lib/gcc/arm-linux-gnueabihf/6/../../../libnnpack.a(softmax.c.o): relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/usr/lib/gcc/arm-linux-gnueabihf/6/../../../libnnpack.a: 无法添加符号: 错误的值
collect2: error: ld returned 1 exit status
Makefile:97: recipe for target 'libdarknet.so' failed
make: *** [libdarknet.so] Error 1

I did this with the latest code from this project with raspi 3b+ .

pi@raspberrypi:~/workspace/darknet-nnpack $ clang -v
clang version 3.8.1-24+rpi1 (tags/RELEASE_381/final)
Target: armv6--linux-gnueabihf
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.6
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.6.4
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.7
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.7.3
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.8
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.8.5
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.9
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/4.9.3
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.1
Found candidate GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/6.3.0
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.6
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.6.4
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.7
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.7.3
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.8
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.8.5
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.9
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/4.9.3
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/5.4.1
Found candidate GCC installation: /usr/lib/gcc/arm-linux-gnueabihf/6.3.0
Selected GCC installation: /usr/bin/../lib/gcc/arm-linux-gnueabihf/6.3.0
Candidate multilib: .;@m32
Selected multilib: .;@m32

digitalbrain79 commented 5 years ago

@hastou I'm sorry. I cannot solve the problem #31. So I reverted your pull request.

shartoo commented 5 years ago

Seems a big bug problem then..

dan-r95 commented 5 years ago

Any updates on this?

xiaofangziLab commented 4 years ago

HI,

I am trying to run detection on multiple images (currently static images, will be image snapshots from a video in the future) on my Raspberry PI 3, the CLI works great and I get the same result 1.3s/image as claimed, but because the CLI needs to load the network and metadata every time, it slows down the detection, so I wrote the following python script to automate the images detection without needing to load the network and metadata every time:
#!/usr/bin/python

import sys
import os.path

sys.path.append("python")

import darknet

yolo_net = darknet.load_net("cfg/tiny-yolo-voc.cfg",
                    "tiny-yolo-voc.weights",0)
yolo_meta = darknet.load_meta("cfg/voc.data")

list = os.listdir("data")
for i in range(0,len(list)):
    imagename = os.path.join("data",list[i])
    if os.path.isfile(imagename):
        if os.path.splitext(imagename)[1] == ".jpg":
            print "Detecting image %s" % imagename
            res = darknet.detect(yolo_net, yolo_meta, imagename)
            print res
I put the python script under darknet-nnpack directory, and update the darknet-nnpack/python/darknet.py with the following line:
lib = CDLL("/root/darknet-nnpack/libdarknet.so", RTLD_GLOBAL)
The python script runs correctly with the upstream darknet(https://github.com/pjreddie/darknet), but as expected, it runs terribly slow(40s+/image) and does not work for my scenario.

The python script could not be run with the darknet-nnpack, with the following error:
root@rpi-3:~/darknet-nnpack# ./test.py
Traceback (most recent call last):
  File "./test.py", line 8, in <module>
    import darknet
  File "python/darknet.py", line 37, in <module>
    lib = CDLL("/root/darknet-nnpack/libdarknet.so", RTLD_GLOBAL)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /root/darknet-nnpack/libdarknet.so: undefined symbol: nnp_convolution_inference
root@rpi-3:~/darknet-nnpack#
I feel that some compile options will be needed to static link the nnpack library into libdarknet.so, or some environment variable to point to the NNPACK-darknet, but I was not able to figure it out.

Any suggestion on how I could move forward will be highly appreciated.

How slove the problem?Thanks.

digitalbrain79 / darknet-nnpack

issue with darknet-nnpack python library #17

1. GPU=0 CUDNN=0: 38s/image

2. NNPACK=1 and ARM_NEON=1: 16s/image