Hishok commented 4 years ago

I am running YOLOv4 using adversarial training however I keep getting the error 'cannot open display' and I am using the -dont_show flag.

I have tried setting adversarial_lr to 1 and 0.05 but I am still getting the same error.

Below is a small section config file that I am using and where I have put adversarial_lr. I managed to get YOLOv4 to work without adversarial_lr but now I am getting this error. I am using the BDD100K dataset.

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=416
height=416
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
adversarial_lr=0.05
attention=1

learning_rate=0.001
burn_in=1000
max_batches = 20000
policy=steps
steps=16000,18000
scales=.1,.1

#cutmix=1
mosaic=1

#:104x104 54:52x52 85:26x26 104:13x13 for 416

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish

!./darknet detector train data/obj.data cfg/yolov4-obj.cfg yolov4.conv.137 -dont_show -map

The line above is what I am using to train.

Thank you in advance!

@AlexeyAB @lukeAI

5117

AlexeyAB commented 4 years ago

Is training stopping or moving on?

AlexeyAB commented 4 years ago

Also you can try to comment these 3 lines: https://github.com/AlexeyAB/darknet/blob/fdb1841eb1699cf24a193787c2da7b2d9bd10070/src/network_kernels.cu#L390-L392

Hishok commented 4 years ago

Training stopped at that point. The error says attention_img does this have something to do with the attention flag? I am running this in google colab. @AlexeyAB

Hishok commented 4 years ago

I have used that in the cfg-file but I am getting the error. Would I have to comment the lines out you mentioned ? @AlexeyAB

AlexeyAB commented 4 years ago

Use attention=0 in cfg-file

Hishok commented 4 years ago

@AlexeyAB Getting a different error now when using attention=0 in the config file:

AlexeyAB commented 4 years ago

Did you comment these 3 lines? https://github.com/AlexeyAB/darknet/blob/fdb1841eb1699cf24a193787c2da7b2d9bd10070/src/network_kernels.cu#L390-L392

Hishok commented 4 years ago

I tried updating it by forking darknet, but when I follow your Google Colab tutorial I can't get it to work in terms of creating the darknet environment @AlexeyAB

AlexeyAB commented 4 years ago

Can you give a link to colab with error?

Hishok commented 4 years ago

The link to colab: https://colab.research.google.com/drive/1sA1o2HzV-_10bQqN8t64wlfGSv2Mi0pB?usp=sharing

The error I am getting is :

8 errors detected in the compilation of "/tmp/tmpxft_0000033f_00000000-7_network_kernels.compute_70.cpp1.ii".
Makefile:168: recipe for target 'obj/network_kernels.o' failed
make: *** [obj/network_kernels.o] Error 1

You can see this in cell 4.

In cell 21 I get the error: /bin/bash: ./darknet: No such file or directory

I must be doing something wrong because I managed to get a number of YOLOv3 and v4 models working without forking darknet.

@AlexeyAB

Hishok commented 4 years ago

Hi @AlexeyAB do you know where this is going wrong?

AlexeyAB commented 4 years ago

Everything works well. I just opened your colab link and clicked Run All, and I didn't get any errors: https://colab.research.google.com/gist/AlexeyAB/5f1434d054d5d704806461612bc8e93c/yolov4-sat.ipynb

Hishok commented 4 years ago

@AlexeyAB apologies I sent the wrong colab link. This is the correct one: https://colab.research.google.com/drive/1sA1o2HzV-_10bQqN8t64wlfGSv2Mi0pB?usp=sharing

The first cell shows where I forked darknet and I can't create the darknet environment.

AlexeyAB commented 4 years ago

This is the same link. Open your link -> press Runtime -> press Restart and run all

Hishok commented 4 years ago

@AlexeyAB Sorry I sent you the exact same link. The actual link is https://colab.research.google.com/drive/1CDN0nBRkraLknTmBQX2i7C5mv1PkK4HX?usp=sharing

In cell 1 when I change it to

# clone darknet repo
!git clone https://github.com/Hishok/darknet

to pick up the changes of the 3 lines of code I can't create darknet.

If I use:

# clone darknet repo
!git clone https://github.com/AlexeyAB/darknet

I can create darknet however I get the errors mentioned above in my original post as it does not have the 3 lines of code that you mentioned.

AlexeyAB commented 4 years ago

Remove your repo, fork it from https://github.com/AlexeyAB/darknet and comment these 3 lines.

Hishok commented 4 years ago

Thank you @AlexeyAB , it looks like it is working ! Training hasn't stopped.

Hishok commented 4 years ago

Hi @AlexeyAB I have trained using SAT on the BDD100K dataset, however after 9000 iterations the mAP score falls to 0. I have been running the colab cell using %%capture otherwise my laptop crashes.

Below is how the mAP score varies over the iterations:

The CFG file :


[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=416
height=416
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
adversarial_lr= 1
#attention=1

learning_rate=0.001
burn_in=1000
max_batches = 20000
policy=steps
steps=16000,18000
scales=.1,.1

#cutmix=1
mosaic=1

@lukeai

AlexeyAB commented 4 years ago

Try to use lower adversarial_lr=0.1 or 0.01

Hishok commented 4 years ago

Thank you @AlexeyAB . I have tried both and at around 4000 and 5000 iterations the AP and mAP goes to 0 and the precision is -nan. Would you recommend going lower than 0.01?

AlexeyAB / darknet

Adversarial Training BDD100K #6705

5117