Seanlinx / mtcnn

595 stars 264 forks source link

Issues in making data for R_Net #4

Closed mychina75 closed 7 years ago

mychina75 commented 7 years ago

Dear Lin, Thank your for your great work. It is very helpful.

I have truble in preparing training data for R-Net training. the usage of gen_hard_example.py.

The code in line 153: imdb = IMDB("wider", image_set, root_path, dataset_path, 'test') But there are no ground truth info for the WIDER test dataset. info in file anno.txt So I changed to use 'train' dataset. This time, cuda out of memory issue occurred, after processed some images.

... 2200 images done [11:29:17] /data/code/mxnet/dmlc-core/include/dmlc/./logging.h:235: [11:29:17] src/storage/./pooled_storage_manager.h:79: cudaMalloc failed: out of memory Traceback (most recent call last): File "/data/code/mtcnn/prepare_data/gen_hard_example.py", line 228, in args.slide_window, args.shuffle, args.vis) File "/data/code/mtcnn/prepare_data/gen_hard_example.py", line 165, in test_net detections = mtcnn_detector.detect_face(imdb, test_data, vis=vis) File "/data/code/mtcnn/core/MtcnnDetector.py", line 457, in detect_face boxes, boxes_c = self.detect_pnet(im) File "/data/code/mtcnn/core/MtcnnDetector.py", line 279, in detect_pnet cls_map, reg = self.pnet_detector.predict(im_resized) File "/data/code/mtcnn/core/fcn_detector.py", line 25, in predict grad_req='null', aux_states=self.aux_params) File "/opt/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/symbol.py", line 852, in bind ctypes.byref(handle))) File "/opt/anaconda/lib/python2.7/site-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [11:29:17] src/storage/./pooled_storage_manager.h:79: cudaMalloc failed: out of memory

Am I miss something? Thank you.

Seanlinx commented 7 years ago

Hi, The test set I used is exactly the training set of Wider Face.
As for the cuda out of memory issue, I don't know the cause of this problem but using NaiveStorageManager will help. You can modify mxnet/src/storage/storage.cc as follows.

diff --git a/src/storage/storage.cc b/src/storage/storage.cc
index d80c64b..87251c1 100644
--- a/src/storage/storage.cc
+++ b/src/storage/storage.cc
@@ -74,6 +74,7 @@ Storage::Handle StorageImpl::Alloc(size_t size, Context ctx) {
           case Context::kGPU: {
 #if MXNET_USE_CUDA
-             ptr = new storage::GPUPooledStorageManager();
+//             ptr = new storage::GPUPooledStorageManager();
+            ptr = new storage::NaiveStorageManager<storage::GPUDeviceStorage>();
 #else
             LOG(FATAL) << "Compile with USE_CUDA=1 to enable GPU usage";
 #endif  // MXNET_USE_CUDA
PierreHao commented 7 years ago

if you use training set of Wider Face, then each image will produce a lot of proposals, and most of them is neg, it will need large amount of memory, how do you keep the ratio of pos:neg:part? @Seanlinx @mychina75 , your gpu memory is 2G? you use PNet(a full conv net), the input (some images) is too large, then it will be out of memory, you can try to resize those images which is too large.

Seanlinx commented 7 years ago

@PierreHao the ratio of pos:neg:part is approximately 1:2:1

mychina75 commented 7 years ago

@Seanlinx, thank you so much. it works now.

Zouhj commented 7 years ago

Hi, I met the same problem as cudaMalloc failed, but before this I was confused about imdb=IMDB("wider",image_set,root_path,dataset_path,"test"), what the ./data/wider/images and ./data/wider/imagelists/test.txt should be? Preparing dataset for rnet is using modle trained by pnet to detection and then compare with ground truth, so I used several images and gave their path to test.txt(not include ground truth), then run gen_hard_example.py. but I am still confused about this. Could you please explain this to me?

Seanlinx commented 7 years ago

@Zouhj Well I just renamed the annotation file of wider_face_train to test.txt and move it to ./data/wider/imglists/. And ./data/wider/images is linked to images of WIDER_train.

Zouhj commented 7 years ago

I see, thank you so much, lin. But I am very sad I meet a new problem: 0 images done terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped) I have not solve this after see you code again. After modify mxnet/src/storage/storage.cc , I test that cudaMalloc problem didn't exist. But it's strange that the new proble. Any advice could you give me?

Seanlinx commented 7 years ago

@Zouhj Sorry but I can not reproduce your problem. You may run it on gdb and set MxnetEngineType to NaiveEngine to see if there's something useful in the backtrace

Zouhj commented 7 years ago

@Seanlinx Thank you!~Your advice is very useful, I have solved it now and it's my problem, the code is correct. I'm generating samples now, it seems that the number of neg is much larger than part and pos (especially pos), I just nearly generate 2K pos the whole wider face dataset, but neg is more than 5000K. So you said ratio pos: neg : part = 1:2:1 is reffering to trainning ratio and I only need to adjust it by myself? Or the code generates are the ratio? so sorry to bother you a lot answering my questions.

Seanlinx commented 7 years ago

@Zouhj Yes, only a sample of the negatives is needed. by the way there are 10k+ faces in training set so there might be some problems if you get only 2k positive samples.

Cv9527 commented 7 years ago

hi, Seanlinx, I use gen_hard_example.py to gererate O-net training data, however, only gererates 12k negative samples, while positve samples 140k and 85k part samples, does the number of negative samples is not enough?

Seanlinx commented 7 years ago

Yes. Maybe you should check the thresholds of the previous two nets and the setting of min_face. Generally, the number of positive samples can't be so large if min_face is set to 24. @Cv9527

Cv9527 commented 7 years ago

image

I use the default value of thresholds and min_face, I have no idea why it generates a few negative samples

Seanlinx commented 7 years ago

You should set the thresholds according to the testing recall and false positives on fddb. The thresholds I use is the correspond threshold of around 60w false positives(97% recall) for pnet, and 8w false positives(95.5% recall) for rnet. @Cv9527

Cv9527 commented 7 years ago

could you share the thresholds that you use to generate training samples, thx

Seanlinx commented 7 years ago

0.5, 0.1 @Cv9527

so-as commented 7 years ago

@mychina75 I met the same problem as cudaMalloc failed. How did you solve this problem? thx

so-as commented 7 years ago

@Seanlinx @Zouhj hello, when I run "python prepare_data/gen_hard_example.py. --dataset_path data/wider/train --image_set train --test_mode pnet --prefix model/pnet_model/pnet" to generate data for RNet, an error appeared " from core.symbol import P_Net, R_Net, O_Net. ImportError: NO module named core.symbol". the directory "model/pnet_model/" is the path I saved for the trained Pnet model. Is there something wrong I did in the usage of gen_hard_example.py ? thank you

Seanlinx commented 7 years ago

@so-as python -m prepare_data.gen_hard_example --dataset_path ...

so-as commented 7 years ago

@Seanlinx thank you for your reply. But it seems core/loader.py the function provide_label return [(k, v.shape) for k, v in zip(self.label_names, self.label)] : TypeError: zip argument 2 must support iteration. Is there some solutions?

so-as commented 7 years ago

@Seanlinx I had solved this problem. thank you

GarrickLin commented 7 years ago

@Zouhj how did you solve this problem?

I see, thank you so much, lin. But I am very sad I meet a new problem:
0 images done
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
I have not solve this after see you code again. After modify mxnet/src/storage/storage.cc , I test that cudaMalloc problem didn't exist. But it's strange that the new proble. Any advice could you give me?

I have met with the similar problem as you did.

Zouhj commented 7 years ago

@GreenKing Hi, nothing I did, actually I didn't find the exact cause of the error. just rerun the code and it works

AlphaQi commented 7 years ago

@Seanlinx Hello, I want to know, Whether the part samples regress face landmark?

Seanlinx commented 7 years ago

@AlphaQi No, part samples come from Wider_Face with no landmark annotations provided

homedawn commented 6 years ago

@Seanlinx Hi, when I was training R-net and I got the issue that "incorrect detections or ground truths” in the code of "assert len(det_boxes)==num_of_images" ,in file gen_hard_example.py. would you tell me why?and I find det_boxes was none ,its not any value.thanks!