Closed MyVanitar closed 7 years ago
Also Yolo calculates IoU more correctly than Recall: https://github.com/AlexeyAB/darknet#when-should-i-stop-training
- IOU - the bigger, the better (says about accuracy) - better to use
- Recall - the bigger, the better (says about accuracy) - actually Yolo calculates true positives, so it shouldn't be used
H, is it possible to change the training snapshots? I mean after passing the 1000 iterations, it just saves snapshots on each 1000th iterations. I want to change this order. for example on each 100th as it saves from 1th to 1000th.
Hi,
Just change this line as you need: https://github.com/AlexeyAB/darknet/blob/195ba5dee284d41dfbb8022d025ae77b75e1b00c/src/detector.c#L150
Hello Alex,
I could fix the problem on the old commit with this sequence:
Compiling the repo with OpenCV 2.4.9 and Cuda 8.0.61 (without applying Patch-2) and using CuDNN-5.1, also the display drivers must be installed with CUDA package and it should not be installed separately by nVIDIA drivers. I could reach to a new record of IoU=77.46% , Recall=94.23%
I could replicate these results on the old commit but I should restart the system before the training! I don't why but if I don't do so, the same results will not be achieved. I should mention the same results may not be necessarily come at the same iteration (for example 2000) but if we train longer, finally all will be converged to almost the same number.
But still I can not get these results with the latest commit. Do you think this is related to memory releasing issues?
I must say I have already set the threshold=0.001
on the latest commit and tested the Recall function with the pre-trained weights (from the old commit) and I can see the exact same numbers, so there is issue with this function.
I see you have modified some parts related to memory after the old commit. Could those modifications generate such effects? I mean if memory releases at the wrong time or not released on time. something like that.
@VanitarNordic May be.
Just let me I do my final test today, if it does not help, I'll tell you to have a look at those memory related issues.
Hello,
I tested again the latest commit but unfortunately it shows no progress. if you have any suggestion based on these evidences just tell me, I'll test because the results are important for me.
I should also mention that in the meantime I trained and tested the SSD on this dataset, but because the SSD ignores the background-only images, I could not compare these two models. The author of the SSD has already mentioned where to change (which function) to include these background-only images in training, but because it is in C++, nobody has done it so far. if you like, you can make a new repo based on this idea. I think it would be a peace of cake for you.
Hi, thank you, when I'll have a time, I'll see in source code of SSD.
Do you have any suggestion for the latest commit of YOLO?
No ideas.
You can make a test yourself if you have any doubt.
Just help me which commits are related to DLL making part and which ones are related to the main code, because some memory leaks were related to the DLL section.
I have not any doub that some changes in the last commits affect the IoU.
By these links in the left part there are commits that related to the DLL:
DLL-code used only when yolo_cpp_dll.sln
compiled. But if compiled darknet.sln
then nothing used related to DLL source code.
Thank you very much.
may I ask you how did you notice about memory leaks in the main code?
Besides, I see several modifications inside image.c
and network.c
and layer.c
and one more thing. is there any C function which might work good in Linux, but not the same in windows? I mean something like rand()
issue. it was in-front of eye which we detected it, but maybe there is or there are some other functions either which might show such kind of behaviors.
fopen(filename, "w");
and fopen(filename, "r");
which were changed to fopen(filename, "wb");
and fopen(filename, "rb");
to work on Windows.rand()
and srand(time(0))
functions in the darknet.Thank you.
I have played with those rand()
functions before, but it did not help. I mean I have changed them all to ranadom_gen()
except this function float rand_uniform(float min, float max)
which was not possible to change the rand()
to random_gen()
. So as a result there is no remained conflict between windows and Linux.
Now I'm looking inside warnings. there are more than 1600 warnings which might give a clue.
Alright, I list you interesting warnings:
Function: dim3 cuda_gridsize(size_t n);
inside cuda.h
Warning:
This code: l.update = update_deconvolutional_layer;
inside deconvolutional_layer.c
Warning:
Alright, I could solve the case. At least now it is like the Linux repo, with 2% more in Recall. Again I should congratulate to you to make the complex original Darknet even better.
First, I did the same steps as I did for the old commit
Then I changed all rand()
with random_gen()
, except float rand_uniform(float min, float max)
. (I had done this before for the old commit also, in General it helps the model to converge faster and handles better results)
I also copied the opencv_imgproc249.dll
near darknet.exe
.
I also suppressed some warnings. majority of them are just type conversion warnings and easy to suppress. it may put some effects also but I'm not quite sure. But anyway suppressing them does not have any side effect and it is good to do.
Regarding the SSD, according to the author this is the part of the code must be modified to include background-only images within training, otherwise it just ignores them.
https://github.com/weiliu89/caffe/blob/ssd/src/caffe/util/bbox_util.cpp#L850
issue talks about this: https://github.com/weiliu89/caffe/issues/146
When you had time just introduce your repository. I don't think it takes long time for you because as I see you write complex C/C++ codes easily like breathing.
I have one more comment regarding this minor issue:
I think it is reproduced randomly when the random_gen()%m
generates 0
. because typically the range would be between 0 to m
. here count=0
has no meaning and at least it should be equal to 1
.
Are you agree that it should be modified like this to be in range from 1 to m?
(random_gen()%m) + 1
I think apart from this we also don't need zero in random generation nowhere in the code. This issue exist also in original Darknet repo which has used the same concept with rand()
I think cubin files should be updated regarding the newer GPU architehctures and CUDA also. NVIDIA told me something like this:
From CUDA 8.0 Toolkit, nvcc can generate cubin files native to the Pascal architectures (compute capability 6.0 and 6.1). When using CUDA Toolkit 8.0, to ensure that nvcc will generate cubin files for all recent GPU architectures as well as a PTX version for forward compatibility with future GPU architectures, specify the appropriate -gencode= parameters on the nvcc command line as shown in the examples below.
nvcc.exe -ccbin "C:\vs2010\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT" -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 --compile -o "Release\mykernel.cu.obj" "mykernel.cu"
maybe source files are not compatible with latest version of CUDA and Pascal GPUs.
Hello,
One question, why you don't use the random_gen()
everywhere instead of the rand()
? is there any reason behind it?
Hi, To make a replacement everywhere, you need to test all the functionality of darknet (classifier, rnn, go, cifar, ...) - I have not done it yet.
Okay, so you mean it is not easy as just replacing the functions. in the other words, these two are not totally identical, yes?
At least rand_s
from random_gen
much more slower than rand
. So when I replaced many rand to random_gen, then darknet initialize detector much more longer.
The maximum possible value also differs.
Yes, you are right. good to know that. Thank you.
The current used instances of random_gen()
in the commit is okay with you?
Yes.
Hello,
The Yolo authors has changed the
YOLO-VOC
CFG files in their original website. I mean here:Training with this new CFG files (with either
thresh = 0.2 or 0.001
) does not handle the results as when I had trained it using older repo and olderYOLO-VOC
CFG file. I can say results are way worse than before.Besides, training with
YOLO-VOC-2.0
handles slightly better results but still it can not compete with older results (with eitherthresh = 0.2 or 0.001
).I have used the latest commit of the repo here (4892071). What is the problem? the new CFG files or repo codes?