shifted bounding boxes on multiple versions

Deadmin1 commented 5 years ago

Heay Alexey, i have to ask you and the community again. Now i have acces to a powerful machine (4x gtx1080) and started training. My configuration is identical to my computer. I have problems with the bounding boxes. They are shifted. I tested with tiny, tiny_3l and both with higher resolutions but its the same. The testing is done on my computer. Compiled darknet again and its still the same. Do you know where this comes from? Trainingsmachine: 4x GTX 1080 cuda 10 cudnn

Training multiple versions at the same time on different gpu's with uniqe folders for dataset, cfg, .data .names

But this still exists when i train just 1 model on 1 gpu

predictions

Edits: 1: Done: -Downloaded and compiled an the trainingsmachine from scratch. -Trained again with tiny-yolo -checked dataset with yolo-mark (looks good) The problem still exists

2: Trained a tiny-yolo on my machine and tested it. No shifting here. But i used the same trainingset and config file like on the more powerful machine. Where could be the problem here i cant understand it.

3: Compiled darknet on trainingsmachine without cudnn and trained and looks still the same

AlexeyAB commented 5 years ago

@Deadmin1 Hi,

Such issue can be if you use width= height= that are not multiple of 32.

What date of Darknet code do you use?
Can you attach your cfg-file to the your message?
Did you change anything in the source code?
What command do you use for testing?
What mAP can you get?

Deadmin1 commented 5 years ago

Heay @AlexeyAB thanks again for your reply.

What date of Darknet code do you use?
- Tried with 1 week old, and today i downloaded and recompiled it again. Same Problem on the Trainingsmachine. On my computer it works
Can you attach your cfg-file to the your message

tiny_std.cfg

[net] # Testing batch=64 subdivisions=4 # Training # batch=64 # subdivisions=2 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 500200 policy=steps steps=400000,450000 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=1 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=1 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 max=200

Did you change anything in the source code?
- Did not change anything.
What command do you use for testing?
- ./darknet detector demo cfg/cone.data cfg/tiny_std.cfg backup/tiny_std_last.weight data/test.mp4
- ./darknet detector test cfg/cone.data cfg/tiny_std.cfg backup/tiny_std_last.weight data/cones.jpg
What mAP can you get?
- 0.03% on Trainingsmachine (with the problem)
- 83 % on my computer (working fine)

But like i said i train it with the same .cfg the same dataset on my computer and it works fine.

Thanks in advance

AlexeyAB commented 5 years ago

What mAP can you get? 0.03% on Trainingsmachine (with the problem) 83 % on my computer (working fine)

This is very strange.

Do you use the same weights-file and cfg-file, that you exactly copied from Training machine?
What parameters in the Makefile do you use on the training machine and your computer?
What GPUs do you use on both computers?
What OS do you use on both computers?
What CUDA, cuDNN and OpenCV versions do you use on both computers? Check that you use cuDNN that is suitable for your CUDA version. Use Download cuDNN v7.4.2 (Dec 14, 2018), for CUDA 10.0 instead of Download cuDNN v7.4.2 (Dec 14, 2018), for CUDA 9.2 if you use CUDA 10.0 https://developer.nvidia.com/rdp/cudnn-download Sometimes, you installed cuDNN for CUDA 9.2 and than installed cuDNN for CUDA 10.0 but paths still referes to the old cuDNN for CUDA 9.2.

AlexeyAB / darknet

shifted bounding boxes on multiple versions #2159