Open ycui123 opened 6 years ago
The first channel is raw image(16bit grayscale) and the second channel is log transformed image.
If you can convert these 2 channels (1st 16-bit + 2nd 8-bit) to the 8-bit 3-channels (total 24-bit), then just use such images for training and detection as usual.
If you can't convert in such a way, then you should change source code to do this. Look at these changes that were made to support 1-channel 8-bit images: https://github.com/AlexeyAB/darknet/pull/936/files
You should change these functions:
Also if OpenCV is used:
And may be this: https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/src/http_stream.cpp#L272-L328
If OpenCV isn't used:
Thanks for the quick reply. I could convert the two channels to 8 bit. And zero pad the 3rd channel? I wonder if that works?
The first channel is raw image(16bit grayscale) and the second channel is log transformed image.
Yes. And the 2nd channel is also 16 bit since I transformed from the first channel.
So you can convert it to the common 8-bit 3 channels in any way as you want and it will work:
or convert two 16-bit channels to the two 8-bit channels, and set all zeros in the 3rd channel.
or convert first 16-bit channel to the two 8-bit channels, and second 16-bit channel to the one 8-bit channel
Just you should do Training and Detection on the same type of converting.
Also you should disable some types of color data augmentation, i.e. set
saturation = 1.0
exposure = 1.5
hue=0
instead of: https://github.com/AlexeyAB/darknet/blob/e301fee8a0d1343824dd8038bc051f728b93bc57/cfg/yolov3.cfg#L14-L16
Thank you! I'll try and let you know!
Hi @AlexeyAB,
or convert two 16-bit channels to the two 8-bit channels, and set all zeros in the 3rd channel.
I used the above method and trained for 8000 iterations and I only have one class. I found that the model didn't overfit the data with more and more iterations.
Here's what I got for 8000 iterations: for thresh = 0.25, precision = 0.86, recall = 0.65, F1-score = 0.74 for thresh = 0.25, TP = 652, FP = 103, FN = 348, average IoU = 62.57 % mean average precision (mAP) = 0.677931, or 67.79 %
I followed all instructions you gave in https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Are there any other ways to improve my performance? I want to lower FP as well as FN as much as possible. Should I train for more iterations?
Thank you
EDITED: My object is very small(usually within 100 100) and image is big(around 12004000).
Can we use route function to concat the two imge? But I don't know how to write in .cfg file in data layer.
Can I use this network with images that only contains 2 channels? I'm dealing with x-ray images. The first channel is raw image(16bit grayscale) and the second channel is log transformed image. Does that work? And could you tell me which file should I modify? Thank you !