AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

Detections changing based on portion of the image being exposed #383

Open harish-khollam opened 6 years ago

harish-khollam commented 6 years ago

Hello @AlexeyAB, I have successfully built a Pipe detection model using YOLO2 on high resolution images.

cfg file consists following parameters: hight: 640 width: 640 random: 1 threshold: 0.6

The model is doing pretty good on large pipes with 100 % accuracy. But if i test it on cluster of small pipes the accuracy decreases. As an experiment i masked certain portions of the image and used for test. The model did pretty well as compared to unmasked.

Following are two images:

  1. Unmasked As you see in the following image the model is able to detect only 1 pipe in the middle cluster. unmasked1

  2. Masked As soon as i blur the surrounding area ; it is able to detect more pipes. masked1

Similar phenomenon is observed when different area of same image is masked. Why does the model perform poorly when exposed to full image as compared to exposing a part of image ? Are there some parameter i can tweak for such use case.

AlexeyAB commented 6 years ago

@harish-khollam Hi,


P.S. I edit post - don't change maxpool layers in the tiny-yolo-voc.cfg

AlexeyAB commented 6 years ago

Or you can base your cfg-file on this:

tiny_yolo_custom.zip

harish-khollam commented 6 years ago

I used tiny-yolo-voc.cfg

Which looks like this.

[net]
batch=64
subdivisions=8
width=640
height=640
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.01
max_batches = 100000
policy=steps
steps=100,1000,20000,30000
scales=.1,10,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors = 0.38,0.38, 0.63,0.63, 0.93,0.93, 1.61,1.61, 3.33,3.33
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

Try to increase width=832 height=832, can you see more pipes? I tried it but then it doesn't detect any thing.

Following are few sample images of size range i am trying to work on. File Size: 2422 × 2422. untitled_16

File Size: 1726 × 1727. untitled_40

File Size: 1008 × 1008. untitled_44

Did you have images in the training dataset, which contain more than 30 labeled pipes? Yes all these Images where labeled, you can see that every single pipe was labeled and there where more images like these.

It would be great if you can help me tackling this issue. i am stuck on this issue for a very long time, i even tried more iterations but the Average loss wont go below 11.

AlexeyAB commented 6 years ago

@harish-khollam

I have done some fixes in the code, try to update your code from this repo. Then add param max=300 in the [region] layer in your cfg-file:

[region]
anchors = 0.38,0.38, 0.63,0.63, 0.93,0.93, 1.61,1.61, 3.33,3.33
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
max=300

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

Then set width=832 and height=832 and train your model.


If previous fixes will works well, then try to train this model: tiny_yolo_custom.zip I'v done there some fixes.

harish-khollam commented 6 years ago

@AlexeyAB I made all the changes as you suggested, This time I was able to reduce the average loss (Before it never went lower than 11 but after these modifications, it reached to 2.4). but as soon as it crosses 100 Iteration; the average loss started increasing & ultimately it started generating -NAN within next 2 iterations.

Few of the thread suggested that it may be an issue with the Labeling, so I removed all the labels with negative value (Crossing the image resolution bounds ) & re-ran the model. even after these changes it generated -NAN at 101 iterations. Tried this with 832x832 as well as 640x640 in .cfg.

For an experiment purpose, i used the same tiny_yolo_custom.cfg with tiny-yolo-voc.weights but the training started with 40100 iterations. it didn't generate any NA till now. I don't know if using tiny-yolo-voc.weights is the right choice over darknet19_448.conv.23

What do you suggest? How can I train the model without generating NAN by using right .weight file? What may be the reason of getting -NAN.

AlexeyAB commented 6 years ago

@harish-khollam


Nan during training can be due:

harish-khollam commented 6 years ago

Thanks for your guidance @AlexeyAB.

harish-khollam commented 6 years ago

Hello @AlexeyAB Suggestions & modification prescribed by you worked very well, I got a very good accuracy than the previous model.

screen shot 2018-02-16 at 12 25 36 pm Thanks a lot for this. And its still getting better.

But I am facing some difficulty in porting this model to the mobile. while running in Darkflow I get the following error AssertionError: Over-read repo/tiny_yolo_custom_8900.weights when I tried to search for this error it says that there is a mismatch between CFG & the weights. I wonder why this would happen. I want to test this on my iPhone. I am using the same code which was working for the previous model trained from darknet19_448.

TheMikeyR commented 6 years ago

@harish-khollam looks great! Could you share the modifications and cfg file you are using? Also how big is your dataset?

AlexeyAB commented 6 years ago

@harish-khollam

  1. What command do you use to run Darkflow with your tiny_yolo_custom model?

  2. Did you train your model using this repository or original repository? https://github.com/pjreddie/darknet

  3. Can you successfully use default yolo-voc.weights downloaded from pjreddie site https://pjreddie.com/media/files/yolo-voc.weights (not from googledrive) with Darkflow?

  4. If you can use https://pjreddie.com/media/files/yolo-voc.weights then:

  5. As I see this is a problem with Darkflow that can't work with original Darknet weights: https://github.com/thtrieu/darkflow/issues/372#issuecomment-322009000

Hi, try it again, but follow instructions in readme and use weights uploaded on google drive only, the ones from darknet are not working with darkflow cfgs now.

  1. Looks like error is here: https://github.com/thtrieu/darkflow/blob/479c83e14559fd5eceb9a9f612503b29a67fac5c/darkflow/utils/loader.py#L126-L127 dirty hack - just comment out thes 2 lins

Also if you want to use Yolo on mobile devices, you can try to use Yolo from dnn-module that built-in OpenCV (>=3.4.0 version) for Android or iOS https://opencv.org/releases.html Examples:

Darkflow uses TensorFlow, that as I see doesn't use GPU on iOS yet, and OpenCV-dnn-Yolo doesn't use GPU too, but highly optimized for CPU, so OpenCV-yolo can be faster than Darkflow.

http://machinethink.net/blog/tensorflow-on-ios/

Limitations of TensorFlow on iOS:

Currently there is no GPU support. TensorFlow does use the Accelerate framework for taking advantage of CPU vector instructions, but when it comes to raw speed you can’t beat Metal.

TheMikeyR commented 6 years ago

@AlexeyAB I'm training on a P2.xlarge aws server, with 11439 MiB available memory (Tesla k80), I've tried to use your custom cfg (most recent linked), but I get the CUDA ERROR: out of memory error when resizing to 896x896, it's funny because it initialised to 1024x1024 and it trained fine. I believe I'm using the correct tiny-yolo-voc.conv.13. My image size is 2208x1242

Also to add, I have a laptop with 4gb memory (m1200), I get out of memory even when setting batch and subdivison to 1. Is the network you linked just really heavy? I don't have any issues regarding original tiny-yolo-voc.cfg.

My cfg:

[net]
batch=64
subdivisions=16
width=832
height=832
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
max_batches = 45000
policy=steps
steps=100,1000,4000
scales=10,.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors = 0.47,0.77, 0.55,0.98, 0.70,1.15, 0.74,0.79, 1.06,1.04
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
small_object=1
max=400

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

Start of training:

./darknet detector train blip_both.data tiny_yolo_custom_alexey.cfg tiny-yolo-vo
c.conv.13                                                                                                         
tiny_yolo_custom_alexey
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   832 x 832 x   3   ->   832 x 832 x  32
    1 max          2 x 2 / 2   832 x 832 x  32   ->   416 x 416 x  32
    2 conv     64  3 x 3 / 1   416 x 416 x  32   ->   416 x 416 x  64
    3 max          2 x 2 / 2   416 x 416 x  64   ->   208 x 208 x  64
    4 conv    128  3 x 3 / 1   208 x 208 x  64   ->   208 x 208 x 128
    5 conv     64  1 x 1 / 1   208 x 208 x 128   ->   208 x 208 x  64
    6 conv    128  3 x 3 / 1   208 x 208 x  64   ->   208 x 208 x 128
    7 max          2 x 2 / 2   208 x 208 x 128   ->   104 x 104 x 128
    8 conv    256  3 x 3 / 1   104 x 104 x 128   ->   104 x 104 x 256
    9 conv    128  1 x 1 / 1   104 x 104 x 256   ->   104 x 104 x 128
   10 conv    256  3 x 3 / 1   104 x 104 x 128   ->   104 x 104 x 256
   11 max          2 x 2 / 2   104 x 104 x 256   ->    52 x  52 x 256
   12 conv    512  3 x 3 / 1    52 x  52 x 256   ->    52 x  52 x 512
   13 conv    512  3 x 3 / 1    52 x  52 x 512   ->    52 x  52 x 512
   14 conv     30  1 x 1 / 1    52 x  52 x 512   ->    52 x  52 x  30
   15 detection
Loading weights from tiny-yolo-voc.conv.13...
 seen 64 
Done!
Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005
Resizing
1024
Loaded: 45.075714 seconds
Region Avg IOU: 0.079985, Class: 1.000000, Obj: 0.498549, No Obj: 0.500154, Avg Recall: 0.000000,  count: 88
Region Avg IOU: 0.090546, Class: 1.000000, Obj: 0.498464, No Obj: 0.500142, Avg Recall: 0.000000,  count: 60
Region Avg IOU: 0.120469, Class: 1.000000, Obj: 0.498516, No Obj: 0.500144, Avg Recall: 0.000000,  count: 76

End of training due to error:

9: 77.087448, 273.408600 avg, 0.000100 rate, 68.094284 seconds, 576 images
Loaded: 0.000052 seconds
Region Avg IOU: 0.131572, Class: 1.000000, Obj: 0.165749, No Obj: 0.167225, Avg Recall: 0.011765,  count: 85
Region Avg IOU: 0.140189, Class: 1.000000, Obj: 0.165928, No Obj: 0.167228, Avg Recall: 0.000000,  count: 44
Region Avg IOU: 0.142925, Class: 1.000000, Obj: 0.165996, No Obj: 0.167226, Avg Recall: 0.025316,  count: 79
Region Avg IOU: 0.103536, Class: 1.000000, Obj: 0.166065, No Obj: 0.167232, Avg Recall: 0.000000,  count: 24
Region Avg IOU: 0.113124, Class: 1.000000, Obj: 0.166023, No Obj: 0.167230, Avg Recall: 0.000000,  count: 38
Region Avg IOU: 0.140396, Class: 1.000000, Obj: 0.164977, No Obj: 0.167221, Avg Recall: 0.000000,  count: 73
Region Avg IOU: 0.131264, Class: 1.000000, Obj: 0.165162, No Obj: 0.167234, Avg Recall: 0.000000,  count: 134
Region Avg IOU: 0.149730, Class: 1.000000, Obj: 0.165761, No Obj: 0.167245, Avg Recall: 0.000000,  count: 94
Region Avg IOU: 0.148957, Class: 1.000000, Obj: 0.165370, No Obj: 0.167223, Avg Recall: 0.000000,  count: 68
Region Avg IOU: 0.130500, Class: 1.000000, Obj: 0.165454, No Obj: 0.167231, Avg Recall: 0.000000,  count: 78
Region Avg IOU: 0.126774, Class: 1.000000, Obj: 0.165550, No Obj: 0.167231, Avg Recall: 0.000000,  count: 110
Region Avg IOU: 0.125435, Class: 1.000000, Obj: 0.165480, No Obj: 0.167242, Avg Recall: 0.000000,  count: 39
Region Avg IOU: 0.113545, Class: 1.000000, Obj: 0.165656, No Obj: 0.167226, Avg Recall: 0.000000,  count: 118
Region Avg IOU: 0.122508, Class: 1.000000, Obj: 0.165301, No Obj: 0.167229, Avg Recall: 0.000000,  count: 70
Region Avg IOU: 0.134069, Class: 1.000000, Obj: 0.165200, No Obj: 0.167213, Avg Recall: 0.000000,  count: 73
Region Avg IOU: 0.129905, Class: 1.000000, Obj: 0.165304, No Obj: 0.167223, Avg Recall: 0.000000,  count: 65
10: 57.855984, 251.853333 avg, 0.000100 rate, 67.484894 seconds, 640 images
Resizing
896
CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)
AlexeyAB commented 6 years ago

@TheMikeyR Yes, this model tiny_yolo_custom.zip ~2-3 times more expensive than default tiny-yolo-voc.cfg. This model can detect 4x more objects on each image than tiny-yolo-voc.

I did some fixes, try to update your code from this repo and train again with random=1.

TheMikeyR commented 6 years ago

@AlexeyAB Thanks for the quick update. I've just reinitialised the training, I will report back, it didn't crash on resize to 800 and 704, so far so good.

TheMikeyR commented 6 years ago

@AlexeyAB It chrashed on 928, I've included the log. I'm watching the nvidia-smi meanwhile, and it peaks on 4980MiB / 11439MiB when it chrashed. It used 4003MiB / 11439MiB of the memory on size 704. It just seems odd it will max out the gpu on resolution 928?

Region Avg IOU: 0.265726, Class: 1.000000, Obj: 0.071476, No Obj: 0.071878, Avg Recall: 0.035714,  count: 56
20: 15.578996, 76.250389 avg, 0.000100 rate, 37.661903 seconds, 1280 images
Resizing
928
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, 5: layer = 0, 6: layer = 0, 7: layer = 3, 8: layer = 0, 9: layer = 0, 10: layer = 0, 11: layer = 3, 12: layer = 0, 13: layer = 0, 14: layer = 0, 15: layer = 21,CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)
AlexeyAB commented 6 years ago

@TheMikeyR Ok, I just added another one fix.

TheMikeyR commented 6 years ago

@AlexeyAB the commit https://github.com/AlexeyAB/darknet/commit/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b didn't work, it crashes at 768

Resizing
768
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, 5: layer = 0, 6: layer = 0, 7: layer = 3, 8: layer = 0, 9: layer = 0, 10: layer = 0, 11: layer = 3, 12: layer = 0, 13: layer = 0, 14: layer = 0, 15: layer = 21, try to allocate workspace, CUDA Error: out of memory

I've tried to modify default tiny-yolo-voc.cfg with same resolution, max=400, small_object=1, and there are no issues in that.

The cfg that works fine is:

[net]
batch=64
subdivisions=16
width=816
height=816
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0001
max_batches = 40200
policy=steps
steps=-1,100,20000,30000
scales=.1,10,.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30
activation=linear

[region]
anchors = 0.47,0.77, 0.55,0.98, 0.70,1.15, 0.74,0.79, 1.06,1.04
bias_match=1
classes=1
coords=4
num=5
softmax=1
jitter=.2
rescore=1
small_object=1
max=400

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1
TheMikeyR commented 6 years ago

I'm trying to replicate what @harish-khollam managed to do on images in size 2208x1242 (all of my pictures), I have the same issue with detecting objects close to each other. For me it makes an average box in between the objects when that happens, even though my ground truth labeling is seperating without any issues. I think the k80 with 12gb memory is not strong enough for the cfg you've uploaded, do you know how much is required? It seems to chrash when it's above 704 in size. The odd thing is that I never see it max out my memory, but yet again it is only reported every 1 second.

Edit: If the custom cfg is 2-3 times more expensive, it should still fit, The cfg I posted use 3396MiB / 11439MiB with size 896.

Edit2: Nevermind it just resized itself to 768 and is now using 10389MiB / 11439MiB, I can't seem to figure out a reason for this?

Edit3: At 864 it goes to 2065MiB / 11439MiB.

TheMikeyR commented 6 years ago

@AlexeyAB I've posted some updates to my last comment which might help debugging, please let me know if you need any additional info or help from me.

AlexeyAB commented 6 years ago

@TheMikeyR So if you set width=768 height=768 random=0 then memory using is 10389MiB / 11439MiB. But if you set width=864 height=864 random=0 then memory using is 2065MiB / 11439MiB, isn't it?

TheMikeyR commented 6 years ago

@AlexeyAB Correct, I've just tested with the cfg I posted above. If I do the same test with the custom cfg, it will run with 864, but crash with 768 due to out of memory.

AlexeyAB commented 6 years ago

@TheMikeyR Can you try this with CUDNN=0 in the Makefile?

TheMikeyR commented 6 years ago

CUDNN=0 gives 3667MiB / 11439MiB on 768, should I just continue training with CUDNN=0 ? Edit: I'm using CUDNN v5.0 and CUDA 8.0

AlexeyAB commented 6 years ago

@TheMikeyR

Try to change these 3 lines each to this: if (s > most) { most = s; printf(" most = %zu ", most); } (this will show what layer and what algorithm takes too much GPU-RAM):

  1. https://github.com/AlexeyAB/darknet/blob/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b/src/convolutional_layer.c#L115

  2. https://github.com/AlexeyAB/darknet/blob/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b/src/convolutional_layer.c#L123

  3. https://github.com/AlexeyAB/darknet/blob/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b/src/convolutional_layer.c#L131


So this problem is here - for some sizes of layer the cuDNN takes too much GPU-RAM due to alignment for maximum performance: https://github.com/AlexeyAB/darknet/blob/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b/src/convolutional_layer.c#L106-L132

I will check, may be I should add code for switching to another not the fastest algorithm for some sizes of layers, for example to use CUDNN_CONVOLUTION_BWD_DATA_NO_WORKSPACE or CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE: https://github.com/AlexeyAB/darknet/blob/e4ab47dfcedb4c87e5eddf484caa4ac0c020fc9b/src/convolutional_layer.c#L162-L177

TheMikeyR commented 6 years ago

Here is logs from the different sizes ordered from small to big

No error

Resizing
672
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4431413248  9: layer = 0, most = 1024  10: layer = 0, most = 4431413248  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done! 

Error

Loading weights from tiny-yolo-voc.conv.13... most = 2304  most = 42680320  most = 2304  most = 42680320  most = 4608  most = 157204480  most = 1024  most = 4608  most = 157204480  most = 9216  most = 18432  most = 2048 
 seen 64 
Done!
Learning Rate: 0.0001, Momentum: 0.9, Decay: 0.0005
Resizing
736
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  9: layer = 0, most = 1024  10: layer = 0, most = 4608  11: layer = 3, 12: layer = 0, most = 9216  most = 4503109632  13: layer = 0, most = 18432  most = 8937013248  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace, CUDA Error: out of memory

No error

Resizing
800
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  most = 157204480  9: layer = 0, most = 1024  10: layer = 0, most = 4608  most = 157204480  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done!

No error

Resizing
832
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  most = 157204480  9: layer = 0, most = 1024  10: layer = 0, most = 4608  most = 157204480  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done!

Error

Resizing
864
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4431413248  most = 4499570688  9: layer = 0, most = 1024  10: layer = 0, most = 4431413248  most = 4499570688  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 8937013248  14: layer = 0, 15: layer = 21, try to allocate workspace, CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.

Error

Resizing
896
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4431413248  most = 4499570688  9: layer = 0, most = 1024  10: layer = 0, most = 4431413248  most = 4499570688  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 8937013248  14: layer = 0, 15: layer = 21, try to allocate workspace, CUDA Error: out of memory

Error

Resizing
928
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4431413248  most = 4499570688  9: layer = 0, most = 1024  10: layer = 0, most = 4431413248  most = 4499570688  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 8937013248  14: layer = 0, 15: layer = 21, try to allocate workspace, CUDA Error: out of memory

No error

960
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  most = 157204480  9: layer = 0, most = 1024  10: layer = 0, most = 4608  most = 157204480  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done! 

No error

Resizing
992
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  most = 157204480  9: layer = 0, most = 1024  10: layer = 0, most = 4608  most = 157204480  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done!

No error

Resizing
1024
 0: layer = 0, 1: layer = 3, 2: layer = 0, 3: layer = 3, 4: layer = 0, most = 2304  most = 42680320  5: layer = 0, 6: layer = 0, most = 2304  most = 42680320  7: layer = 3, 8: layer = 0, most = 4608  most = 157204480  9: layer = 0, most = 1024  10: layer = 0, most = 4608  most = 157204480  11: layer = 3, 12: layer = 0, most = 9216  13: layer = 0, most = 18432  14: layer = 0, most = 2048  15: layer = 21, try to allocate workspace,  CUDA allocate done!
AlexeyAB commented 6 years ago

@TheMikeyR Thanks! I added some fixes for cuDNN, update code and try again.

TheMikeyR commented 6 years ago

Thanks @AlexeyAB it seems to work now, at least when it gets to a resolution where it chrashed before it prints out that it is using the slow cudnn algo without workspace.

I can see I'm running out of memory at 1056, so I reduced the scale steps from 10 to 6 steps, which I assume it can maximum go to 832 +- (32*6) in resolution (640 min and 1024 max), please correct me if I'm wrong. The parameter is set like this scales=6,.1,.1.

Thanks for your help, I will let a training go for some time and return back if it helps with my issue regarding detecting objects close to each other. If you want me to run some tests on a linux system please let me know.

AlexeyAB commented 6 years ago

@TheMikeyR Thanks for the tests.


TheMikeyR commented 6 years ago

@AlexeyAB ah okay, thanks! I can't seem to figure out the formula rand() = integer between 0 and 32767 % = modulo init_w = 832 (in custom cfg case).

I can't seem to get it to produce anything higher than 992, but I've seen it go up to 1056 (where my aws server runs out of memory, which I would like to prevent, but still want to extra advantage of random=1) do you have any suggestions on how to limit the resolution jumps?

Again thanks a lot for your time and effort!

AlexeyAB commented 6 years ago

@TheMikeyR int dim = (rand() % 12 + (init_w/32 - 5)) * 32;

I can't seem to get it to produce anything higher than 992, but I've seen it go up to 1056

Check that you set correct values mulptile of 32, width=832 height=832 in the cfg.

TheMikeyR commented 6 years ago

Thanks for the detailed description, it is set to 832 in the config currently, so hopefully it shouldn't go above 1024 and crash. I will let it run and report back tomorrow.

bit-scientist commented 6 years ago

Hi @harish-khollam, Do you mind sharing your .cfg and weights files? I need to detect fire/smoke in my images and my labeling is quite similar to yours.

abdulkalam1233 commented 6 years ago

@AlexeyAB you have used max in region layer but what is max?

AlexeyAB commented 6 years ago

@abdulkalam1233 max=300 in the [region]-layer is the maximum number of truths (bounded boxes) that will be used from one txt-label-file. The rest truth boxes will be rejected.

abdulkalam1233 commented 6 years ago

@AlexeyAB that's great. Man i need your help for my application can i have your skype id mail me on kalama449@gmail.com