AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Multiple Detector instances using Yolo as DLL #73

Closed fjoanr closed 7 years ago

fjoanr commented 7 years ago

Hello Alex,

first of all, thank you for this repository, it works wonders so far!

I would like to know how to correctly delete a Detector instance when using YOLO as a DLL in my own VS project... What I have at the moment is: Detector detector1("yolo-voc.cfg", "yolo-voc.weights"); Detector detector2("yolo-voc.cfg", "yolo-voc.weights");

... some implementation to train an external Net ...

detector1.~Detector(); detector2.~Detector();

The problem is when compiling I get an error saying that "yolo_console_dll.exe has triggered a breakpoint" captura

which is in yolo_v2_class.cpp line 78: free(detector_gpu.avg);

Do you have any idea how to destroy a Detector class variable or how could it be solved using the ~Detector() class?

thank you very much for your time,

best regards, Francesc.

AlexeyAB commented 7 years ago

@fjoanr Hi,

This is because the destructor is called twice.

If you create a Detector-object, then you should not call destructor:


If you want to destroy Detector manually, then use std::shared_ptr<>

#include <memory>

{
    std::shared_ptr<Detector> detector1 = std::make_shared<Detector>("yolo-voc.cfg", "yolo-voc.weights");
    std::shared_ptr<Detector> detector2 = std::make_shared<Detector>("yolo-voc.cfg", "yolo-voc.weights");
    std::shared_ptr<Detector> detector3 = std::make_shared<Detector>("yolo-voc.cfg", "yolo-voc.weights");

    cv::Mat mat_img = cv::imread("x64/data/dog.jpg");
    std::vector<bbox_t> result_vec = detector1->detect(mat_img, 0.2);

    detector1.reset();  // destructor called manually
    detector2.reset();  // destructor called manually
    detector3.reset();  // destructor called manually
}   // If the destructor has already been called, it is no longer called, otherwise destructors called

But I also note that there is bug in Yolo - that when the detector is repeatedly created and deleted many times, a memory leak occurs. So far I have not done this fix.

fjoanr commented 7 years ago

Hello @AlexeyAB

Thank you for the fast response. Since my program was running out of memory I decided to compile on an individual project the 2 YOLOs plus the neural net and saved the neural network's weights and then uploaded them onto the main project. It solved the issue with multiple detectors not being able to be created and deleted.

I now have a new issue. For the correct implementation of the fusionNet, I need to remove the last layer of Yolo, so it would end with the Convolutional layer

[convolutional]
size=1
stride=1
pad=1
filters=125
activation=linear

If I run the detector(), as the functions implies, it returns a vector, however, the last convolutional layer is supposed to return a 13x13x125 feature map. Is there any way to save the feature map from the last layer? Or will I have to create an independent detector() function?

thank you again for your help, it is much appreciated.

Francesc.

AlexeyAB commented 7 years ago

@fjoanr Hi,

If you didn't remove last region-layer from the .cfg-file, then you should implement your own detect()-function.


    int i,j,n;
                                     // 13*13*(20+5)*5 = 21125
    float *predictions = l.output;   // 21125 values for 13x13 WxH, 20 classes and 5 anchors

    for (i = 0; i < l.w*l.h; ++i){   // 13x13  (W x H)
        int row = i / l.w;
        int col = i % l.w;
        for(n = 0; n < l.n; ++n){    // 5  (anchors)
            int index = i*l.n + n;

            int pred_index = index * (l.classes + 5); // 13*13*5 * (20(classes) + 4(coords)+1(To))
            float val = predictions[pred_index];
            float val = predictions[pred_index + 1];
            ...
            float val = predictions[pred_index + 24];

       }
   }
AlexeyAB commented 7 years ago

@fjoanr Which of 3 fusionNet do you use? https://arxiv.org/find/all/1/all:+FusionNet/0/1/0/all/0/1

fjoanr commented 7 years ago

@AlexeyAB Thank you for the info, I will check it out today to see if I can manage it!

About the fusionNet, I am using https://arxiv.org/abs/1507.06821 but with the two streams being ConvNets on RGB images. I am also investigating on using an extra convolutional layer instead of the fully connected layers, but we will see about it :D

AlexeyAB commented 7 years ago

@fjoanr Yes, this is an interesting topic, how much can improve the accuracy on top CNNs by using RGB-D compared with RGB .

What do you use to get Depth on image?


Earlier I thought that it is possible to achieve good help from a depth-map only obtained from active cameras (ToF, Lidar ...). But now there is reason to believe that the depth-map obtained from passive stereo cameras also has enough accuracy to help in the recognition of objects. Autopilot on Tesla cars should uses 3D reconstruction and depth-map to help detect objects, and there is used passive optical cameras (there should be very good cameras, with excellent optics and high resolution): http://blog.ted.com/what-will-the-future-look-like-elon-musk-speaks-at-ted2017/

What’s happening at Tesla? Tesla Model 3 is coming in July, Musk says, and it’ll have a special feature: autopilot. Using only passive optical cameras and GPS, no LIDAR, the Model 3 will be capable of autonomous driving. “Once you solve cameras for vision, autonomy is solved; if you don’t solve vision, it’s not solved

Previously, the good quality of the depth-map could be obtained only by using active cameras, such on Kinect:

507cubfhca4

fjoanr commented 7 years ago

Hey @AlexeyAB

it is indeed a very interesting approach and probably with the correct testing it might be able to improve the quality of the currently state-of-the-art object detectors. In my case, I am not using a Depth-stream, but 2 RGB-streams. The idea of using footage from ToF or stereo cameras is interesting but don't you think the computational cost and the hardware costs for implementing a surveillance system would increase too much?

At one point this year I was using the Kinect to obtain depth images and worked on another human tracker, but the quality of the depth images from Kinect is (right now) far from ideal, i.e., it needs a lot of pre-processing to be useful in any kind of application.

I have been able to obtain and save the features from the last convolutional layer of YOLO. Now, when I try to train the new fusion-stream (with one convolutional layer and the region layer), I get an error in the console as in the image below:

captura

The fusionnet.cfg file is:

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8

height=13
width=13
channels=1
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 80200
policy=steps
steps=40000,60000
scales=.1,.1

[convolutional]
size=1
stride=1
pad=0
filters=125
activation=linear

[region]
anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
bias_match=1
classes=20
coords=4
num=5
softmax=1
jitter=.3
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

And the images are stored in darknet-master/CONVdevkit/conv and conv_test, with images being stored as 00_0x (x from 0 to 62500). Do you have any idea or suggestion on where should I look in the main code to understand the issue?

Thank you!

AlexeyAB commented 7 years ago

@fjoanr Hi,


fjoanr commented 7 years ago

Hello,

classes= 20
train  = data/conv/conv.txt
valid  = data/conv/conv_test.txt
names = data/obj.names
backup = backup/

.

AlexeyAB commented 7 years ago

@fjoanr Hi,

  1. captura

    So your error ocurs because you have not .txt-files with labeling. I.e. if you initially have dog.jpg and dog.txt with labels, and for this image you got 125 images dog_01.jpg, dog_02.jpg, ... dog_125.jpg. Then you should copy dog.txt to dog_01.txt, dog_02.txt, ... dog_125.txt.

  2. The last conv layer outputs a 13x13x125 feature map. Since I want to store them as images, I divide each map into 125 outputs. Thus, the input to the network is 13x13x1 representing each depth slice of the feature map of 125 slices (using 500 train images I would input 62500 13x13 images to the fusion layer). ...

    I'm not sure if this is the best way. But may be.

  3. So, which 2 RGB images do you want to merge via FusionNet? Are these two RGB images of the same object on the same background, but from different points of view - i.e. 2 images from stereo-cameras?

  4. And how will you merge these two RGB images?


Yes it was Kinect 2.0. And Kinect 1.0 is certainly worse, but it's still much better than stereo-cameras :) https://www.youtube.com/watch?v=Zx2E19IV2zs

fjoanr commented 7 years ago

Hello @AlexeyAB, sorry for the delay on answering.

Do you think I need to train the fusion layer 2000*classes times? Since the network is just 3 layers?

.

AlexeyAB commented 7 years ago

@fjoanr Hi,

the AVG is going down but I have noticed that the values for Obj, No Obj and Avg Recall are most of the times 0.0000... is it because I am in an early training stage or something might be wrong?

Do you think I need to train the fusion layer 2000*classes times? Since the network is just 3 layers?

Yes, I think it is necessary to train 2000 iterations per class and test several previous weights.

fjoanr commented 7 years ago

Hi @AlexeyAB ,

I've been training the fusionnet as we've talked here but in the end when running darknet.exe detector recall ... all the values of IOU and Recall are 0%, so I guess my approach was not correct.

I will try a new approach, where I gather the convolutional features (13x13x125) splitted in 125 images of 13x13 (as before) but in the darknet.sln code I would like to merge the 125 channels together again, because OpenCV cannot save an image of 125 channels and darknet.exe detector is unable to read the 13x13x125 matrix from OpenCV, as this error pops-up:

captura

The problem is I am unable to find the place where the code reads the images. In void train_detector( ... ) from detector.c lines 54-56 the code reads:

 list *plist = get_paths(train_images);
 //int N = plist->size;
char **paths = (char **)list_to_array(plist);

Where I suppose it reads the paths from train.txt file. Where should I modify the code to add merging of single-channel images into a multidimensional matrix?

Thank you again!

AlexeyAB commented 7 years ago

@fjoanr Hi,

During training this code load images:

  1. First load before loop here: pthread_t load_thread = load_data(args); https://github.com/AlexeyAB/darknet/blob/d8bafc728478e5cba9cf41eca01d66a38800eddd/src/detector.c#L76

  2. And next loads in loop:


So I think probably you should change load_data_region() https://github.com/AlexeyAB/darknet/blob/d8bafc728478e5cba9cf41eca01d66a38800eddd/src/data.c#L500

fjoanr commented 7 years ago

Hello @AlexeyAB

Thank you for the reply once again! I will try to sort the issues out and I will mark these issue as closed.

Best regards, Francesc.