Results of the Lastest commit (4892071) and latest cfg files of the YOLO v2

MyVanitar commented 7 years ago

Hello,

The Yolo authors has changed the YOLO-VOC CFG files in their original website. I mean here:

2017-08-13_10-51-46

Training with this new CFG files (with either thresh = 0.2 or 0.001) does not handle the results as when I had trained it using older repo and older YOLO-VOC CFG file. I can say results are way worse than before.

Besides, training with YOLO-VOC-2.0 handles slightly better results but still it can not compete with older results (with either thresh = 0.2 or 0.001).

I have used the latest commit of the repo here (4892071). What is the problem? the new CFG files or repo codes?

AlexeyAB commented 7 years ago

In general IoU value is most important than Recall
Also Yolo calculates IoU more correctly than Recall: https://github.com/AlexeyAB/darknet#when-should-i-stop-training
- Yolo calculates the best IoU (with low threshold) instead of average IoU - so it's not quite correct
- Yolo calculates True-positives instead of Recall - so this is not correct at all

IOU - the bigger, the better (says about accuracy) - better to use

Recall - the bigger, the better (says about accuracy) - actually Yolo calculates true positives, so it shouldn't be used

The best IoU - this is an important metric for improving the quality of some parts of the neural network, but not the best metric for the whole detection algorithm, the better to use average IoU

MyVanitar commented 7 years ago

H, is it possible to change the training snapshots? I mean after passing the 1000 iterations, it just saves snapshots on each 1000th iterations. I want to change this order. for example on each 100th as it saves from 1th to 1000th.

AlexeyAB commented 7 years ago

Hi,

Just change this line as you need: https://github.com/AlexeyAB/darknet/blob/195ba5dee284d41dfbb8022d025ae77b75e1b00c/src/detector.c#L150

MyVanitar commented 7 years ago

Hello Alex,

I could fix the problem on the old commit with this sequence: Compiling the repo with OpenCV 2.4.9 and Cuda 8.0.61 (without applying Patch-2) and using CuDNN-5.1, also the display drivers must be installed with CUDA package and it should not be installed separately by nVIDIA drivers. I could reach to a new record of IoU=77.46% , Recall=94.23%

I could replicate these results on the old commit but I should restart the system before the training! I don't why but if I don't do so, the same results will not be achieved. I should mention the same results may not be necessarily come at the same iteration (for example 2000) but if we train longer, finally all will be converged to almost the same number.

But still I can not get these results with the latest commit. Do you think this is related to memory releasing issues?

I must say I have already set the threshold=0.001 on the latest commit and tested the Recall function with the pre-trained weights (from the old commit) and I can see the exact same numbers, so there is issue with this function.

MyVanitar commented 7 years ago

I see you have modified some parts related to memory after the old commit. Could those modifications generate such effects? I mean if memory releases at the wrong time or not released on time. something like that.

AlexeyAB commented 7 years ago

@VanitarNordic May be.

MyVanitar commented 7 years ago

Just let me I do my final test today, if it does not help, I'll tell you to have a look at those memory related issues.

MyVanitar commented 7 years ago

Hello,

I tested again the latest commit but unfortunately it shows no progress. if you have any suggestion based on these evidences just tell me, I'll test because the results are important for me.

I should also mention that in the meantime I trained and tested the SSD on this dataset, but because the SSD ignores the background-only images, I could not compare these two models. The author of the SSD has already mentioned where to change (which function) to include these background-only images in training, but because it is in C++, nobody has done it so far. if you like, you can make a new repo based on this idea. I think it would be a peace of cake for you.

AlexeyAB commented 7 years ago

Hi, thank you, when I'll have a time, I'll see in source code of SSD.

MyVanitar commented 7 years ago

Do you have any suggestion for the latest commit of YOLO?

AlexeyAB commented 7 years ago

No ideas.

MyVanitar commented 7 years ago

You can make a test yourself if you have any doubt.

Just help me which commits are related to DLL making part and which ones are related to the main code, because some memory leaks were related to the DLL section.

AlexeyAB commented 7 years ago

I have not any doub that some changes in the last commits affect the IoU.

But I do not see how I can easily fix it
And because best-IoU is not the best indicator, then we should use more useful indicator mAP on a more diverse dataset (Pascal VOC 2007 + Pascal VOC + 2012 + MS COCO) to say that there really is a significant change in accuracy. But even knowing this, I still did not see in the commits what could influence accuracy.

By these links in the left part there are commits that related to the DLL:

DLL-code used only when yolo_cpp_dll.sln compiled. But if compiled darknet.sln then nothing used related to DLL source code.

MyVanitar commented 7 years ago

Thank you very much.

may I ask you how did you notice about memory leaks in the main code?

Besides, I see several modifications inside image.c and network.c and layer.c

MyVanitar commented 7 years ago

and one more thing. is there any C function which might work good in Linux, but not the same in windows? I mean something like rand() issue. it was in-front of eye which we detected it, but maybe there is or there are some other functions either which might show such kind of behaviors.

AlexeyAB commented 7 years ago

There were fopen(filename, "w"); and fopen(filename, "r"); which were changed to fopen(filename, "wb"); and fopen(filename, "rb"); to work on Windows.
Also there are still some rand() and srand(time(0)) functions in the darknet.

MyVanitar commented 7 years ago

Thank you.

I have played with those rand() functions before, but it did not help. I mean I have changed them all to ranadom_gen() except this function float rand_uniform(float min, float max) which was not possible to change the rand() to random_gen(). So as a result there is no remained conflict between windows and Linux.

Now I'm looking inside warnings. there are more than 1600 warnings which might give a clue.

MyVanitar commented 7 years ago

Alright, I list you interesting warnings:

Function: dim3 cuda_gridsize(size_t n); inside cuda.h Warning: 2017-08-28_0-59-46

This code: l.update = update_deconvolutional_layer; inside deconvolutional_layer.c Warning: 2017-08-28_1-02-50

MyVanitar commented 7 years ago

Alright, I could solve the case. At least now it is like the Linux repo, with 2% more in Recall. Again I should congratulate to you to make the complex original Darknet even better.

First, I did the same steps as I did for the old commit Then I changed all rand() with random_gen(), except float rand_uniform(float min, float max). (I had done this before for the old commit also, in General it helps the model to converge faster and handles better results) I also copied the opencv_imgproc249.dll near darknet.exe.

I also suppressed some warnings. majority of them are just type conversion warnings and easy to suppress. it may put some effects also but I'm not quite sure. But anyway suppressing them does not have any side effect and it is good to do.

MyVanitar commented 7 years ago

Regarding the SSD, according to the author this is the part of the code must be modified to include background-only images within training, otherwise it just ignores them.

https://github.com/weiliu89/caffe/blob/ssd/src/caffe/util/bbox_util.cpp#L850

issue talks about this: https://github.com/weiliu89/caffe/issues/146

When you had time just introduce your repository. I don't think it takes long time for you because as I see you write complex C/C++ codes easily like breathing.

MyVanitar commented 7 years ago

I have one more comment regarding this minor issue:

29259939-a7ef83fe-80db-11e7-92d9-1f4360319be5

I think it is reproduced randomly when the random_gen()%m generates 0. because typically the range would be between 0 to m. here count=0 has no meaning and at least it should be equal to 1.

Are you agree that it should be modified like this to be in range from 1 to m? (random_gen()%m) + 1

I think apart from this we also don't need zero in random generation nowhere in the code. This issue exist also in original Darknet repo which has used the same concept with rand()

MyVanitar commented 7 years ago

I think cubin files should be updated regarding the newer GPU architehctures and CUDA also. NVIDIA told me something like this:

From CUDA 8.0 Toolkit, nvcc can generate cubin files native to the Pascal architectures (compute capability 6.0 and 6.1). When using CUDA Toolkit 8.0, to ensure that nvcc will generate cubin files for all recent GPU architectures as well as a PTX version for forward compatibility with future GPU architectures, specify the appropriate -gencode= parameters on the nvcc command line as shown in the examples below.

nvcc.exe -ccbin "C:\vs2010\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT" -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 --compile -o "Release\mykernel.cu.obj" "mykernel.cu"

MyVanitar commented 7 years ago

maybe source files are not compatible with latest version of CUDA and Pascal GPUs.

MyVanitar commented 7 years ago

Hello,

One question, why you don't use the random_gen() everywhere instead of the rand()? is there any reason behind it?

AlexeyAB commented 7 years ago

Hi, To make a replacement everywhere, you need to test all the functionality of darknet (classifier, rnn, go, cifar, ...) - I have not done it yet.

MyVanitar commented 7 years ago

Okay, so you mean it is not easy as just replacing the functions. in the other words, these two are not totally identical, yes?

AlexeyAB commented 7 years ago

At least rand_s from random_gen much more slower than rand. So when I replaced many rand to random_gen, then darknet initialize detector much more longer. The maximum possible value also differs.

MyVanitar commented 7 years ago

Yes, you are right. good to know that. Thank you.

The current used instances of random_gen() in the commit is okay with you?

AlexeyAB commented 7 years ago

Yes.

AlexeyAB / darknet

Results of the Lastest commit (4892071) and latest cfg files of the YOLO v2 #156