Cuda memory error - Githubissues

bit-scientist commented 6 years ago

This question might be a bit different from previous Cuda error ones.

I have trained my data with yolov2 and got good accuracy results. I am trying to deploy it in real time, but my system is only detecting from 7 cameras, I plan to use for 8 cameras. As for PC specs, I have two 1080 GTX GPUs, and 8 cameras are connected to my PC and I am able to test the detection systems in almost real-time. .cfg file I am testing with is as follows:

[net]
batch=1
subdivisions=1
height=448
width=640
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00000001
max_batches = 62300
policy=steps
steps=100,5000,65000,72000
scales=10,.1,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=128
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=3
activation=linear

[patch_region]
classes=3
softmax=1
rescore=1
class_scales=31.74,14.32,1

When I decrease image size twice (224x320) in .cfg file I can enable all 8 cameras, but then system detects too many false positives. I tend to think that one camera requires no more 2Gb of GPU memory for detection to work, and if I have two 1080 GTX GPUs, I could get all 8 cameras work. How can I reduce memory requirement and get same results as with 7 cameras. I don't understand the vms (video managemaent system) part of the system and I can only change .cfg file. Could any one suggest a reasonable solution to this? Thank you!

kmsravindra commented 6 years ago

@git-sohib, why do you have a non-squared image size in .cfg file. I think its recommended to have a squared image resolution - something like 320x320 or 224x224.

AlexeyAB commented 6 years ago

@git-sohib You should

or use low resolution network 224x320, as you done
or use yolov3-tiny.cfg inteasd of yolov2.cfg: https://github.com/AlexeyAB/darknet#how-to-train-tiny-yolo-to-detect-your-custom-objects

AlexeyAB commented 6 years ago

@kmsravindra About square resolution - this is only recommendation. But if you can train it and can get good accuracy - then you can use it.

bit-scientist commented 6 years ago

@AlexeyAB Thank you for reply, as you can see in cfg file it's different than yolov2 itself. For example it has no anchors, jitter and random argument here. How would you change that yolov3-tiny.cfg according to my cfg shown above. I would change myself, but it takes 3-4 for days for me to do that since I am not greatly familiar with these. I am running out of time for particular reasons. I will definitely learn it myself one day. Thanks for your efforts

AlexeyAB commented 6 years ago

@git-sohib

What GitHub repository do you use?
And what does it mean [patch_region]?
Simple way - change each of 2 [yolo] layers in the yolov3-tiny.cfg to your:
```
[patch_region]
classes=3
softmax=1
rescore=1
class_scales=31.74,14.32,1
```

bit-scientist commented 6 years ago

@AlexeyAB It was used by someone else so I don’t know much about it. Thank you I am gonna try and come back with feedback

bit-scientist commented 6 years ago

@AlexeyAB I got error that says:

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512
   15 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255
   16 patch_region
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128
   19 Type not recognized: [upsample]
Unused field: 'stride = 2'
   20 route  19 8
   21 Layer before convolutional layer must output image.: No error

Apparently, it does not recognize [upsample] command, what should I do with that?

AlexeyAB commented 6 years ago

@git-sohib You should use the latest code of this GitHub repository.

AlexeyAB / darknet

Cuda memory error #854