AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.71k stars 7.96k forks source link

Detector not training faster why ? #96

Closed visha-l closed 7 years ago

visha-l commented 7 years ago

I am making a detector as you have mentioned in this link build detector from scratch, I followed all the instructions and now I am using p2.xlarge (EC2 instance of AWS) to provide it gpu for fast training, but it is training with the same speed as it was, when I was not using gpu. This p2.xlarge has GPU=1 so It should run faster but, this is not happening why ?

I have changed in Makefile (GPU=1) , as instructed in link. so tell me what else needs to be done.

AlexeyAB commented 7 years ago

At first, make sure that everything is compiled without errors.

visha-l commented 7 years ago

This version of cuda is installed on my system. CUDA Version 8.0.61.

This version of cuDNN is installed on my system.

#define CUDNN_MAJOR      6
#define CUDNN_MINOR      0
#define CUDNN_PATCHLEVEL 21
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

I think it means cuDNN version 6.0.21

nvidia-smi command is giving me ::

Fri Jun  2 10:16:08 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   50C    P0    56W / 149W |      0MiB / 11439MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

make is giving me this ::

gcc  -DGPU -I/usr/local/cuda-8.0/include/ -DCUDNN  -Wall -Wfatal-errors  -Ofast -DGPU -DCUDNN -c ./src/gemm.c -o obj/gemm.o
In file included from ./src/gemm.c:3:0:
./src/cuda.h:10:26: fatal error: cuda_runtime.h: No such file or directory
 #include "cuda_runtime.h"
                          ^
compilation terminated.
make: *** [obj/gemm.o] Error 1
AlexeyAB commented 7 years ago

Try to solve this make-error: https://www.google.ru/search?q=fatal+error%3A+cuda_runtime.h%3A+No+such+file+or+directory&rlz=1C1MSIM_enRU714RU714&oq=fatal+error%3A+cuda_runtime.h%3A+No+such+file+or+directory&aqs=chrome..69i57.199j0j7&sourceid=chrome&ie=UTF-8

visha-l commented 7 years ago
In file included from ./src/gemm.c:3:0:
./src/cuda.h:15:19: fatal error: cudnn.h: No such file or directory
 #include "cudnn.h"
                   ^
compilation terminated.
make: *** [obj/gemm.o] Error 1

Now getting this error.

My Makefile is ::


GPU=1
CUDNN=1
OPENCV=0
DEBUG=0

ARCH= -gencode arch=compute_20,code=[sm_20,sm_21] \
      -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52]

# This is what I use, uncomme
nt if you know your arch and want to specify
# ARCH=  -gencode arch=compute_52,code=compute_52

VPATH=./src/
EXEC=darknet
OBJDIR=./obj/

CC=gcc
NVCC=nvcc 
OPTS=-Ofast
LDFLAGS= -lm -pthread 
COMMON= 
CFLAGS=-Wall -Wfatal-errors 

ifeq ($(DEBUG), 1) 
OPTS=-O0 -g
endif

CFLAGS+=$(OPTS)

ifeq ($(OPENCV), 1) 
COMMON+= -DOPENCV
CFLAGS+= -DOPENCV
LDFLAGS+= `pkg-config --libs opencv` 
COMMON+= `pkg-config --cflags opencv` 
endif

ifeq ($(GPU), 1) 
COMMON+= -DGPU -I/usr/local/cuda-7.0/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda-7.0/lib64 -lcuda -lcudart -lcublas -lcurand
endif

ifeq ($(CUDNN), 1) 
COMMON+= -DCUDNN 
CFLAGS+= -DCUDNN
LDFLAGS+= -lcudnn
endif

OBJ=gemm.o utils.o cuda.o deconvolutional_layer.o convolutional_layer.o list.o image.o activations.o im2col.o col2im.o blas.o crop_layer.o dropout_layer.o maxpool_layer.o softmax_layer.o data.o matrix.o network.o connected_layer.o cost_layer.o parser.o option_list.o darknet.o detection_layer.o captcha.o route_layer.o writing.o box.o nightmare.o normalization_layer.o avgpool_layer.o coco.o dice.o yolo.o detector.o layer.o compare.o regressor.o classifier.o local_layer.o swag.o shortcut_layer.o activation_layer.o rnn_layer.o gru_layer.o rnn.o rnn_vid.o crnn_layer.o demo.o tag.o cifar.o go.o batchnorm_layer.o art.o region_layer.o reorg_layer.o lsd.o super.o voxel.o tree.o
ifeq ($(GPU), 1) 
LDFLAGS+= -lstdc++ 
OBJ+=convolutional_kernels.o deconvolutional_kernels.o activation_kernels.o im2col_kernels.o col2im_kernels.o blas_kernels.o crop_layer_kernels.o dropout_layer_kernels.o maxpool_layer_kernels.o network_kernels.o avgpool_layer_kernels.o
endif

OBJS = $(addprefix $(OBJDIR), $(OBJ))
DEPS = $(wildcard src/*.h) Makefile

all: obj backup results $(EXEC)

$(EXEC): $(OBJS)
    $(CC) $(COMMON) $(CFLAGS) $^ -o $@ $(LDFLAGS)

$(OBJDIR)%.o: %.c $(DEPS)
    $(CC) $(COMMON) $(CFLAGS) -c $< -o $@

$(OBJDIR)%.o: %.cu $(DEPS)
    $(NVCC) $(ARCH) $(COMMON) --compiler-options "$(CFLAGS)" -c $< -o $@

obj:
    mkdir -p obj
backup:
    mkdir -p backup
results:
    mkdir -p results

.PHONY: clean

clean:
    rm -rf $(OBJS) $(EXEC)

Am I suppose to make some changes in Makefile for giving path of cudnn

visha-l commented 7 years ago
nvcc  -gencode arch=compute_20,code=[sm_20,sm_21] -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52]  -DGPU -I/usr/local/cuda-7.0/include/ --compiler-options "-Wall -Wfatal-errors  -Ofast -DGPU" -c ./src/convolutional_kernels.cu -o obj/convolutional_kernels.o
nvcc fatal   : 'sm_21]' is not in 'keyword=value' format
make: *** [obj/convolutional_kernels.o] Error 255

why getting this error?

ubuntu@ip-10-0-0-226:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Wed_Jul_17_18:36:13_PDT_2013
Cuda compilation tools, release 5.5, V5.5.0
visha-l commented 7 years ago

Please help me out.

ubuntu@ip-10-0-0-226:~/darknet$ ./darknet detector train Yolo_mark-master/x64/Release/data/obj.data Yolo_mark-master/x64/Release/yolo-obj.cfg darknet19_448.conv.23
yolo-obj
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1  1088 x1088 x   3   ->  1088 x1088 x  32
    1 max          2 x 2 / 2  1088 x1088 x  32   ->   544 x 544 x  32
    2 conv     64  3 x 3 / 1   544 x 544 x  32   ->   544 x 544 x  64
    3 max          2 x 2 / 2   544 x 544 x  64   ->   272 x 272 x  64
    4 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    5 conv     64  1 x 1 / 1   272 x 272 x 128   ->   272 x 272 x  64
    6 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    7 max          2 x 2 / 2   272 x 272 x 128   ->   136 x 136 x 128
    8 CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)
AlexeyAB commented 7 years ago

Show first 10 lines of Yolo_mark-master/x64/Release/yolo-obj.cfg

And try to set subdivisions=16 or 32 in yolo-obj.cfg: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolo-voc.2.0.cfg#L3

Also show values of parameters from yolo-obj.cfg:

visha-l commented 7 years ago

[net] batch=1 subdivisions=8 height=1088 width=1088 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

filters=125 classes=20 random=1

now giving this error ?

ubuntu@ip-10-0-0-226:~/darknet$ ./darknet detector train Yolo_mark-master/x64/Release/data/obj.data Yolo_mark-master/x64/Release/yolo-obj.cfg darknet19_448.conv.23
./darknet: error while loading shared libraries: libcudart.so.7.0: cannot open shared object file: No such file or directory
AlexeyAB commented 7 years ago

As said here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

  • change line batch to batch=64
  • change line subdivisions to subdivisions=8

https://www.google.ru/search?q=libcudart.so.7.0%3A+cannot+open+shared+object+file&rlz=1C1MSIM_enRU714RU714&oq=libcudart.so.7.0%3A+cannot+open+shared+object+file&aqs=chrome..69i57.680j0j7&sourceid=chrome&ie=UTF-8

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

visha-l commented 7 years ago

Sir it is stilling giving the previous error .

ubuntu@ip-10-0-0-226:~/darknet$ ./darknet detector train Yolo_mark-master/x64/Release/data/obj.data Yolo_mark-master/x64/Release/yolo-obj.cfg darknet19_448.conv.23
yolo-obj
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1  1088 x1088 x   3   ->  1088 x1088 x  32
    1 max          2 x 2 / 2  1088 x1088 x  32   ->   544 x 544 x  32
    2 conv     64  3 x 3 / 1   544 x 544 x  32   ->   544 x 544 x  64
    3 max          2 x 2 / 2   544 x 544 x  64   ->   272 x 272 x  64
    4 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    5 conv     64  1 x 1 / 1   272 x 272 x 128   ->   272 x 272 x  64
    6 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    7 max          2 x 2 / 2   272 x 272 x 128   ->   136 x 136 x 128
    8 CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)

I changed my Yolo_mark-master/x64/Release/yolo-obj.cfg

[net]
batch=64
subdivisions=8
height=1088
width=1088
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,25000,35000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2

even when i set Batch 32 it was giving error.

visha-l commented 7 years ago

If I understand it correctly then batch in .cfg file is the number of images that It will train in each iteration. I am using p2.xlarge which has count of gpu=1 , and when i start my detector training with batch=32, subdivisions=8 it give core dump , and when I decreased this batch value to batch=16 it runs it 579 iterations but then it stopped with error It can not load backup//yolo-obj_580.weights file. and also when I checked the weight file like yolo-obj_300.wieghts it contains some bytes 256MB but the other files above 300 were not containing the data. even file yolo-obj_350.weights file is not containing data.

And sir, when I tested my image with this file it gives following output.


 vishal@user756:~/darknet$ ./darknet detect Yolo_mark-master/x64/Release/yolo-obj.cfg /home/vishal/Desktop/yolo-obj_300.weights /home/vishal/CARS/Audi/audi_1.jpg -thres 0
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1  1088 x1088 x   3   ->  1088 x1088 x  32
    1 max          2 x 2 / 2  1088 x1088 x  32   ->   544 x 544 x  32
    2 conv     64  3 x 3 / 1   544 x 544 x  32   ->   544 x 544 x  64
    3 max          2 x 2 / 2   544 x 544 x  64   ->   272 x 272 x  64
    4 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    5 conv     64  1 x 1 / 1   272 x 272 x 128   ->   272 x 272 x  64
    6 conv    128  3 x 3 / 1   272 x 272 x  64   ->   272 x 272 x 128
    7 max          2 x 2 / 2   272 x 272 x 128   ->   136 x 136 x 128
    8 conv    256  3 x 3 / 1   136 x 136 x 128   ->   136 x 136 x 256
    9 conv    128  1 x 1 / 1   136 x 136 x 256   ->   136 x 136 x 128
   10 conv    256  3 x 3 / 1   136 x 136 x 128   ->   136 x 136 x 256
   11 max          2 x 2 / 2   136 x 136 x 256   ->    68 x  68 x 256
   12 conv    512  3 x 3 / 1    68 x  68 x 256   ->    68 x  68 x 512
   13 conv    256  1 x 1 / 1    68 x  68 x 512   ->    68 x  68 x 256
   14 conv    512  3 x 3 / 1    68 x  68 x 256   ->    68 x  68 x 512
   15 conv    256  1 x 1 / 1    68 x  68 x 512   ->    68 x  68 x 256
   16 conv    512  3 x 3 / 1    68 x  68 x 256   ->    68 x  68 x 512
   17 max          2 x 2 / 2    68 x  68 x 512   ->    34 x  34 x 512
   18 conv   1024  3 x 3 / 1    34 x  34 x 512   ->    34 x  34 x1024
   19 conv    512  1 x 1 / 1    34 x  34 x1024   ->    34 x  34 x 512
   20 conv   1024  3 x 3 / 1    34 x  34 x 512   ->    34 x  34 x1024
   21 conv    512  1 x 1 / 1    34 x  34 x1024   ->    34 x  34 x 512
   22 conv   1024  3 x 3 / 1    34 x  34 x 512   ->    34 x  34 x1024
   23 conv   1024  3 x 3 / 1    34 x  34 x1024   ->    34 x  34 x1024
   24 conv   1024  3 x 3 / 1    34 x  34 x1024   ->    34 x  34 x1024
   25 route  16
   26 reorg              / 2    68 x  68 x 512   ->    34 x  34 x2048
   27 route  26 24
   28 conv   1024  3 x 3 / 1    34 x  34 x3072   ->    34 x  34 x1024
   29 conv    125  1 x 1 / 1    34 x  34 x1024   ->    34 x  34 x 125
   30 detection
Loading weights from /home/vishal/Desktop/yolo-obj_300.weights...Done!
Segmentation fault (core dumped)

I am trying to find the logo of the car, so I gather 20 different make cars and took around 25 images per class, and also used yolo_mark to create txt file corresponding to each image file.

I created obj.names file which contains the class (name of make of car) in each new line.

chevrolet
honda
hyundai
mahindra
nissan
skoda
tata
toyota
audi
bmw
datsun
fiat
ford
jaguar
maruti-suzuki
mercedes
range-rover
renault
volkswagen
volvo

I created obj.data file which contains.

classes= 20
train  = data/train.txt
valid  = data/train.txt
names = data/obj.names
backup = backup/

Makefile starting content .

GPU=1
CUDNN=1
OPENCV=1
DEBUG=0

ARCH= -gencode arch=compute_20,code=[sm_20,sm_21] \
      -gencode arch=compute_30,code=sm_30 \
      -gencode arch=compute_35,code=sm_35 \
      -gencode arch=compute_50,code=[sm_50,compute_50] \
      -gencode arch=compute_52,code=[sm_52,compute_52]
ed obj.names file which co
# This is what I use, uncomment if you know your arch and want to specify
# ARCH=  -gencode arch=compute_52,code=compute_52

VPATH=./src/

Sir in above comment I have shown you my .cfg file.

So please answer me how can I solve the problem of segmentation fault which is coming during training as well as during testing which I tested with starting weight files. One more thing I want to , why all these multiple weight files are generating, what is the significance of generating weight file for each Iteration.

Also one more thing that after that error during training It did not save any backup file, so how will use the saved weight file for retraining.

It will be great help sir.

thanks a lot for all help you did till now, please help me little more

visha-l commented 7 years ago

Please help me out .

One more thing sir I want to ask you. If a single weight file is taking 256MB ,which is created by a single iteration and we are suppose to run it for 1000 iterations at-least then what will be the storage requirement for this number of weight file.

I was using 60GB storage for this and , It get filled and training stopped at 579th iteration. by giving error that can't load yolo-obj_580.weights file. and no storage left.

So, How much storage is required to store these weight files.

AlexeyAB commented 7 years ago

If you use height=1088 and width=1088 then set: batch=16 subdivisions=16 random=0

vg123 commented 7 years ago

Sir but in my case weight files are generating with digits as suffix from 1 to 500 with increment of 1 so,basically from 0 to 1000 it is generating 1000 weight files.

What all these terms stand for. Is there any documentation to understand the meaning of these term of .cfg file.

AlexeyAB commented 7 years ago

@visha-l @vg123 You should use: