AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Image with 2 channels #1162

Open ycui123 opened 6 years ago

ycui123 commented 6 years ago

Can I use this network with images that only contains 2 channels? I'm dealing with x-ray images. The first channel is raw image(16bit grayscale) and the second channel is log transformed image. Does that work? And could you tell me which file should I modify? Thank you !

AlexeyAB commented 6 years ago

The first channel is raw image(16bit grayscale) and the second channel is log transformed image.

If you can convert these 2 channels (1st 16-bit + 2nd 8-bit) to the 8-bit 3-channels (total 24-bit), then just use such images for training and detection as usual.

If you can't convert in such a way, then you should change source code to do this. Look at these changes that were made to support 1-channel 8-bit images: https://github.com/AlexeyAB/darknet/pull/936/files

You should change these functions:

  1. https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/image.c#L936-L954

Also if OpenCV is used:

  1. https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/image.c#L956-L986

  2. https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/data.c#L723-L789

  3. And may be this: https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/http_stream.cpp#L272-L328


If OpenCV isn't used:

  1. https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/image.c#L1810-L1841

  2. https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/data.c#L791-L842

ycui123 commented 6 years ago

Thanks for the quick reply. I could convert the two channels to 8 bit. And zero pad the 3rd channel? I wonder if that works?

AlexeyAB commented 6 years ago

The first channel is raw image(16bit grayscale) and the second channel is log transformed image.

ycui123 commented 6 years ago

Yes. And the 2nd channel is also 16 bit since I transformed from the first channel.

AlexeyAB commented 6 years ago

So you can convert it to the common 8-bit 3 channels in any way as you want and it will work:

Just you should do Training and Detection on the same type of converting.


Also you should disable some types of color data augmentation, i.e. set

saturation = 1.0
exposure = 1.5 
hue=0

instead of: https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/cfg/yolov3.cfg#L14-L16

ycui123 commented 6 years ago

Thank you! I'll try and let you know!

ycui123 commented 6 years ago

Hi @AlexeyAB,

or convert two 16-bit channels to the two 8-bit channels, and set all zeros in the 3rd channel.

I used the above method and trained for 8000 iterations and I only have one class. I found that the model didn't overfit the data with more and more iterations.

Here's what I got for 8000 iterations: for thresh = 0.25, precision = 0.86, recall = 0.65, F1-score = 0.74 for thresh = 0.25, TP = 652, FP = 103, FN = 348, average IoU = 62.57 % mean average precision (mAP) = 0.677931, or 67.79 %

I followed all instructions you gave in https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Are there any other ways to improve my performance? I want to lower FP as well as FN as much as possible. Should I train for more iterations?

Thank you

EDITED: My object is very small(usually within 100 100) and image is big(around 12004000).

popper0912 commented 6 years ago

Can we use route function to concat the two imge? But I don't know how to write in .cfg file in data layer.