Darknet.jl

IanButterworth commented 4 years ago

I just wanted to report on some progress that's been made on putting together a Julia (julialang.org) wrapper for this branch of Darknet that is based on pre-build binaries of this branch, and will require no further installation steps than install Julia and type ]add Darknet.

This is the current status of this package: https://github.com/ianshmean/Darknet.jl It has two manually built binaries for Linux and MacOS, and convenience functions for running Darknet (not training yet). We're near to releasing support on further platforms, but haven't figured out windows yet.

Cross-compillation of binaries

Darknet.jl is based on pre-compilled binaries, that are auto-compilled using Julia's BinaryBuilder.jl package and CI builder infrastructure. This approach requires no build on the users machine, just an automated download and unpack. We're trying to get cross-compilation working across 13 target platforms, but haven't figured out windows yet. All others seem successful (windows was omitted from this build, but it fails): You can see the latest output of the builder here: https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=224&view=results

Currently we're targeting CPU-only, as a starting place.

You can see the draft cross-compile build script here https://github.com/JuliaPackaging/Yggdrasil/pull/202/files

It would be great to fix windows, if you have any tips on how to modify our existing build script (we've tried to administer some patches). Also, for this work, I'm keen for Darknet to start following semver, so that we can be specific and clear on which version we're building (https://github.com/AlexeyAB/darknet/issues/2671)

cc. @giordano

AlexeyAB commented 4 years ago

@ianshmean Hi, Nice work!

Try to comment these two lines for compiling on Windows:

If it will help, you can do PR.

giordano commented 4 years ago

That helps, thanks a lot! However, we have also set LDFLAGS="-lws2_32". Without that, I was getting this error:

g++ -shared -std=c++11 -fvisibility=hidden -DLIB_EXPORTS -Iinclude/ -I3rdparty/stb/include -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -fPIC ./obj/image_opencv.o ./obj/http_stream.o ./obj/gemm.o ./obj/utils.o ./obj/dark_cuda.o ./obj/convolutional_layer.o ./obj/list.o ./obj/image.o ./obj/activations.o ./obj/im2col.o ./obj/col2im.o ./obj/blas.o ./obj/crop_layer.o ./obj/dropout_layer.o ./obj/maxpool_layer.o ./obj/softmax_layer.o ./obj/data.o ./obj/matrix.o ./obj/network.o ./obj/connected_layer.o ./obj/cost_layer.o ./obj/parser.o ./obj/option_list.o ./obj/darknet.o ./obj/detection_layer.o ./obj/captcha.o ./obj/route_layer.o ./obj/writing.o ./obj/box.o ./obj/nightmare.o ./obj/normalization_layer.o ./obj/avgpool_layer.o ./obj/coco.o ./obj/dice.o ./obj/yolo.o ./obj/detector.o ./obj/layer.o ./obj/compare.o ./obj/classifier.o ./obj/local_layer.o ./obj/swag.o ./obj/shortcut_layer.o ./obj/activation_layer.o ./obj/rnn_layer.o ./obj/gru_layer.o ./obj/rnn.o ./obj/rnn_vid.o ./obj/crnn_layer.o ./obj/demo.o ./obj/tag.o ./obj/cifar.o ./obj/go.o ./obj/batchnorm_layer.o ./obj/art.o ./obj/region_layer.o ./obj/reorg_layer.o ./obj/reorg_old_layer.o ./obj/super.o ./obj/voxel.o ./obj/tree.o ./obj/yolo_layer.o ./obj/gaussian_yolo_layer.o ./obj/upsample_layer.o ./obj/lstm_layer.o ./obj/conv_lstm_layer.o ./obj/scale_channels_layer.o ./obj/sam_layer.o src/yolo_v2_class.cpp -o libdarknet.dll -lm -pthread
src/yolo_v2_class.cpp:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
 #include "darknet.h"
 ^
src/yolo_v2_class.cpp: In member function ‘std::vector<bbox_t> Detector::tracking_id(std::vector<bbox_t>, bool, int, int)’:
src/yolo_v2_class.cpp:370:42: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         if (prev_bbox_vec_deque.size() > frames_story) prev_bbox_vec_deque.pop_back();
                                          ^
src/yolo_v2_class.cpp:385:36: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                     if (cur_dist < max_dist && (k.track_id == 0 || dist_vec[m] > cur_dist)) {
                                    ^
src/yolo_v2_class.cpp:409:42: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         if (prev_bbox_vec_deque.size() > frames_story) prev_bbox_vec_deque.pop_back();
                                          ^
./obj/http_stream.o:http_stream.cpp:(.text+0x6d): undefined reference to `_imp__shutdown@8'
./obj/http_stream.o:http_stream.cpp:(.text+0xb3): undefined reference to `_imp__recv@16'
./obj/http_stream.o:http_stream.cpp:(.text+0xd9): undefined reference to `_imp__closesocket@4'
./obj/http_stream.o:http_stream.cpp:(.text+0x1c1): undefined reference to `_imp__shutdown@8'
./obj/http_stream.o:http_stream.cpp:(.text+0x3a1): undefined reference to `_imp__shutdown@8'
./obj/http_stream.o:http_stream.cpp:(.text+0x3e7): undefined reference to `_imp__socket@12'
./obj/http_stream.o:http_stream.cpp:(.text+0x40f): undefined reference to `_imp__htons@4'
./obj/http_stream.o:http_stream.cpp:(.text+0x44d): undefined reference to `_imp__setsockopt@20'
./obj/http_stream.o:http_stream.cpp:(.text+0x486): undefined reference to `_imp__ioctlsocket@12'
./obj/http_stream.o:http_stream.cpp:(.text+0x4ba): undefined reference to `_imp__bind@12'
./obj/http_stream.o:http_stream.cpp:(.text+0x4e6): undefined reference to `_imp__listen@8'
./obj/http_stream.o:http_stream.cpp:(.text+0x6a1): undefined reference to `_imp__shutdown@8'
./obj/http_stream.o:http_stream.cpp:(.text+0xae9): undefined reference to `_imp__shutdown@8'
./obj/http_stream.o:http_stream.cpp:(.text+0xb30): undefined reference to `_imp__socket@12'
./obj/http_stream.o:http_stream.cpp:(.text+0xb58): undefined reference to `_imp__htons@4'
./obj/http_stream.o:http_stream.cpp:(.text+0xb96): undefined reference to `_imp__setsockopt@20'
./obj/http_stream.o:http_stream.cpp:(.text+0xbcf): undefined reference to `_imp__ioctlsocket@12'
./obj/http_stream.o:http_stream.cpp:(.text+0xc07): undefined reference to `_imp__bind@12'
./obj/http_stream.o:http_stream.cpp:(.text+0xc33): undefined reference to `_imp__listen@8'
./obj/http_stream.o:http_stream.cpp:(.text+0xde8): undefined reference to `_imp__shutdown@8'
/opt/i686-w64-mingw32/bin/../lib/gcc/i686-w64-mingw32/4.8.5/../../../../i686-w64-mingw32/bin/ld: ./obj/http_stream.o: bad reloc address 0x17 in section `.text.unlikely'
collect2: error: ld returned 1 exit status
make: *** [Makefile:136: libdarknet.dll] Error 1

AlexeyAB commented 4 years ago

Yes, it can require libwsock32.a library LDFLAGS+=-lws2_32 in the Makefile if you use MinGW instead of MSVS/Cygwin on Windows.

So better to use there: https://github.com/AlexeyAB/darknet/blob/10c40551dcadec68050befa6a1cecc6f69049d0d/Makefile#L75


ifeq ($(OS),Windows_NT)
LDFLAGS+=-lws2_32
endif

giordano commented 4 years ago

Ok, in our case uname returns MSYS_NT-6.* though.

I managed to build for i686-w64-mingw32, now we're only missing x86_64-w64-mingw32:

# make libdarknet.${dlext} LIBNAMESO="libdarknet.${dlext}" LIBSO=1 GPU=0 CUDNN=0 CUDNN_HALF=0 OPENCV=0 DEBUG=0 OPENMP=0 LIBSO=1 ZED_CAMERA=0 
gcc -Iinclude/ -I3rdparty/stb/include -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -fPIC -c ./src/gemm.c -o obj/gemm.o
./src/gemm.c:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
 #include "gemm.h"
 ^
In file included from ./src/gemm.c:519:0:
/opt/x86_64-w64-mingw32/lib/gcc/x86_64-w64-mingw32/4.8.5/include/ammintrin.h:31:3: error: #error "SSE4A instruction set not enabled"
 # error "SSE4A instruction set not enabled"
   ^
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:150: obj/gemm.o] Error 1

AlexeyAB commented 4 years ago

Do you use AVX=0 or AVX=1 in Makefile?

giordano commented 4 years ago

I didn't modify the Makfile apart from the LDFLAGS setting, so the default (shown above) is 0. I tried also with AVX=1:

# make libdarknet.${dlext} LIBNAMESO="libdarknet.${dlext}" LIBSO=1 GPU=0 CUDNN=0 CUDNN_HALF=0 OPENCV=0 DEBUG=0 OPENMP=0 LIBSO=1 ZED_CAMERA=0 AVX=1
gcc -Iinclude/ -I3rdparty/stb/include -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -ffp-contract=fast -mavx -mavx2 -msse3 -msse4.1 -msse4.2 -msse4a -Ofast -fPIC -c ./src/gemm.c -o obj/gemm.o
./src/gemm.c:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
 #include "gemm.h"
 ^
./src/gemm.c: In function ‘_castu32_f32’:
./src/gemm.c:534:5: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     return *((float *)&a);
     ^
./src/gemm.c: In function ‘_mm256_extract_float32’:
./src/gemm.c:538:13: error: request for member ‘m256_f32’ in something not a structure or union
     return a.m256_f32[index];
             ^
compilation terminated due to -Wfatal-errors.
make: *** [Makefile:150: obj/gemm.o] Error 1

AlexeyAB commented 4 years ago

Try to change this line: https://github.com/AlexeyAB/darknet/blob/10c40551dcadec68050befa6a1cecc6f69049d0d/src/gemm.c#L515 to this #if (defined(__AVX__) && defined(__x86_64__)) || (defined(_WIN64) && !defined(__MINGW32__))

giordano commented 4 years ago

Yes, this does the trick for the build without AVX (I'm not sure if @ianshmean needs AVX, though). Thank you very much.

I'll open a PR with the changes we're using.

IanButterworth commented 4 years ago

This is great. Although, given that we're CPU-only for the moment, it would be nice to have AVX enabled, given:

improved performance of detection and training on Intel CPU with AVX (Yolo v3 ~85%, Yolo v2 ~10%)

If we build with AVX, but run the binaries on a non-intel processor would errors occur? Also the same question for GPU, CUDNN, CUDNN_HALF..?

It would be great if we could build a single fully-functional binary that could make use of whatever's on the user's machine

AlexeyAB commented 4 years ago

If we build with AVX, but run the binaries on a non-intel processor would errors occur?

AVX1 and AVX2 are supported on both Intel and AMD CPUs.
If you use old Intel/AMD CPU where aren't AVX1/AVX2 then it will work without errors, it will automatically disable AVX in run-time: https://github.com/AlexeyAB/darknet/blob/10c40551dcadec68050befa6a1cecc6f69049d0d/src/gemm.c#L684-L704
The code that is compiled for x86_64 will not work on non-x86_64 CPUs like ARM, so you should compile it with ARM-compiler

Also the same question for GPU, CUDNN, CUDNN_HALF..?

If you compiled it with GPU=1 CUDNN=1 the there should be installed CUDA and cuDNN, and there should be nVidia GPU, otherwise it willnot work.

IanButterworth commented 4 years ago

Ok, how about CUDNN_HALF? If that binary is run on a non-tesla gpu, would it fail?

I think 3 sets of binaries makes sense:

CPU-only, (AVX=1, but AVX=0 on ARM)
GPU=1 CUDNN=1
GPU=1 CUDNN=1 CUDNN_HALF=1

AlexeyAB commented 4 years ago

CUDNN_HALF=1 will be checked in run-time too.

ARM: OPENMP=1
x86_64: OPENMP=1 AVX=1
nVidia GPU: GPU=1 CUDNN=1 CUDNN_HALF=1

Also I don't know what about OpenCV.

Can you show how Darknet can be used from Julia-language?

IanButterworth commented 4 years ago

Great! Arm can have CUDA, i.e. the Jetson boards, so that can be included in the last group.

With this grouping we can serve every platform with two binary releases:

cpu-only: OPENMP=1 on all, AVX=1 on all except arm, powerpc windows
gpu: GPU=1 CUDNN=1 CUDNN_HALF=1 on all

As for how to run it on Julia:

Install julia (I recommend v1.3-rc5, which is about to be released) https://julialang.org/downloads/
run julia and type ] to enter pkg mode
type add Darknet to install Darknet
press backspace to return to the main mode

From there, you can use the examples on the readme here.

For instance:

using Darknet, FileIO
d = "/path/to/weights_and_config_files/"
weightsfile = "yolov3-tiny.weights"
cfgfile = "yolov3-tiny.cfg"
datafile = "coco.data"

imagefile = "/path/to/images/test.jpg"

net = Darknet.load_network(joinpath(d, cfgfile), joinpath(d, weightsfile), 1)
meta = Darknet.get_metadata(joinpath(d, datafile));

img_d = Darknet.load_image_color(imagefile, 0, 0);  

results = Darknet.detect(net, meta, img_d, thresh=0.1, nms=0.3)

Currently it's limited to detection only, but all of the exposed methods are wrapped and waiting for convenience functions to be written around them.

Edit: Simplified example

AlexeyAB / darknet

Darknet.jl - A Julia wrapper with cross-compiled binaries through Julia's BinaryBuilder #4323

Darknet.jl

Cross-compillation of binaries

gpu: `GPU=1 CUDNN=1 CUDNN_HALF=1` on all

AlexeyAB / darknet

Darknet.jl - A Julia wrapper with cross-compiled binaries through Julia's BinaryBuilder #4323

Darknet.jl

Cross-compillation of binaries

gpu: GPU=1 CUDNN=1 CUDNN_HALF=1 on all

gpu: `GPU=1 CUDNN=1 CUDNN_HALF=1` on all