AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

training on 4 channel (RGB-D) images. #7367

Open hdaniel0707 opened 3 years ago

hdaniel0707 commented 3 years ago

Hello guys, I really like the repository, I have been using it for a while, it works perfectly for RGB images. I wanted to try out with 4 channel RGB-D images. I created the images with 4 channels and I changed the channel to 4 in the config file. When I tried to run the training, after initializing the network: "Done! Loaded 136 layers from weights-file Create 6 permanent cpu-threads Segmentation fault" and it quits. This code was working with RGB images (I only changed the channel and the images). I am training on GPU, but I don't think it has anything to do with it.

My questions: 1.) Is there a way to train 4 channel RGB-D images in this repository (I don't really want to use another one)? 2.) If so, beside changing the channel number at the beginning of the config file (1 place), do I have to do anything else? 3.) If it is not possible in this repository, may you advise me another repo that has a similar structure to this one so I can transfer my codes easily?

Thank you very much for your help!

blackwool commented 3 years ago

By default, opencv load color picture as 3 channels. You should write loading image function by your self. Also you may take compare.c as an example to load more channels picture as input for training and testing .

CRIGIM commented 3 years ago

Hi, I have the same issue. I created my 4-channel images as .tif but I've got the error: " Error in load_data_detection() - OpenCV Cannot load image data/obj/image1.tif"

@hdaniel0707 : Did you manage to load 4 channel RGB-D images? @blackwool : Which file I need to modify to include my custom loading function?

Thank you!

EyGy commented 3 years ago

Hi, is there an update for this? @hdaniel0707 , @CRIGIM did you make any progress? I am currently trying to get it work with 4-channels and really would appreciate any kind of help.

I customized the data-loader for my 4 channel pictures, but i am experiencing crashes without any error-msg, when trying to train yolov4.

Thank you in advance!

mjack3 commented 3 years ago

Hello @EyGy @hdaniel0707 @CRIGIM , i am working in the same, please, could you help me :)

PDT: python programer here (not C, so..) image

EyGy commented 3 years ago

@mjack3 I currently implemented a workaround that is fusing my 4 channels to 3 before passing them to yolo. However, I plan on diving deeper into the code soon and get it to work with 4 channels.

If my current understanding is correct, then using 4-channel input won't work with the all/most/some of the implemented data augmentation techniques. I will post a short update here, as soon as I made progress....

mjack3 commented 3 years ago

@EyGy I am still working on it but in my case is a little different. I have a RGB image and the last channel is a 0 matrix, this means i dont need it.

I flag channel = 2 in the .cfg file and then I get an error, maybe because the network load with 2 channel and the images are loaded as RGB. Do you know what lane of code the network ingests the image? If yes, i think that if i delete the last channel to fix to the shape expected by the YOLO my problem would be solved.

As i dont know a C programmer this task is hard for me (python coder)

EyGy commented 3 years ago

@mjack3 so you have pictures encoded with 3 channels, where one of the channels is just 0's. Like one pixel is e.g R G B = 54 254 0 ? Then you should just load the image as three channel image. Yolo will learn by itself that there is no information in the B channel to extract any features from. Alternativley you could consider filling your third and empty channel with additional information (e.g. duplicating the most important channel or doing domain-related transformations. Beware of distortions by the augmentation pipeline!).

The location of the image-loading-function depends on whether your build is with openCV or without. With openCV enabled it should be extern "C" mat_cv *load_image_mat_cv(const char *filename, int flag) in file _imageopencv.cpp. Without openCV it should be image load_image_stb(char *filename, int channels) in image.c. When you are used to python you should easily be able to code your transformations in python using openCV and then just "translate" it to the corresponding openCV c++ functions.