Open krishneel opened 7 years ago
Hello, sorry for the late response. Are you able to sanity check your labels and check whether those are correct? I would also expect your network to learn well, in principle. That being said, it does sound like image segmentation would be more appropriate for your task, right?
Have a look at this example: https://github.com/NVIDIA/DIGITS/tree/digits-5.0/examples/semantic-segmentation
This article might also be helpful: https://devblogs.nvidia.com/parallelforall/image-segmentation-using-digits-5/
Finally, it doesn't sound like you're using DIGITS, correct?
I have managed to train DetectNet on my custom data to detect truck from aerial images. I used 600 for training and 200 for Val. The images were of 1280 x 720 resolution. The detection results gives bounding boxes as required.
Now, I want to do something little different that is, I want to predict the probabilities of an image region containing an object and then use some segmentation method like min-max cuts to segment the actual object.
If I understand well than DetectNet overlays an image with specified grid and than each grid predicts whether an object or part of an object is present on that particular grid. I am not interested in getting object bounding boxes but just a heatmap or probability map of each region specified by the grid. I think #L2484 should give this heatmap. I have removed all bounding box layers from the DetectNet Prototxt and my sample prototxt file is attached below.
Note that I did not use the Nvidia Data Argumentation Layer but I wrote my own slightly different argumentation and I divided the image into grids and then label each grid either as an object, part of an object
(IOU > 0.7)
or background. Labels are either0
for background or1
for foreground. I wrote the labels to HDF-5 which I read and use the layer reshape to change to 2D blobs. For e.g. for an image of610 x 610
with grid size of16 x 16
I will get38 x 38
regions which is read from HDF5 as1 x 1444
dimension vector. Finally just like the DetectNet I use EuclideanLoss to compute the loss.Unfortunately the aforementioned network seems not to converge at all. I am using 600 training image where objects are covering almost
30-60%
of each image. I have spent alot of time trying to debug what is wrong and I realized that after the first iteration the output from the Sigmoid Layer #L2399 outputs only zero. I tried to run it for over 200 epoch but its same. Only in the first iteration the output is some probabilities but thereafter its all zero.My question is:
cvg/classifier
.Thanks