CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow
Apache License 2.0
3.1k stars 1.1k forks source link

Testing data #26

Open akashmaity opened 7 years ago

akashmaity commented 7 years ago

Hi, Can anyone please guide me how to test the network on a testing dataset? I have been trying for hours and still without any results.

kerawits commented 7 years ago

I have the same question.

roilandwu commented 7 years ago

I have the same question, too.

kevinkit commented 7 years ago

Did one of you manage to actually train the network completly with a decent loss ?

stevefoy commented 7 years ago

I have the same issue to deploy a test, i've trained to 20hours after a strange few crashes at the start.Would be useful to know how to test or validate the model ??

gopi77 commented 7 years ago

Yes, I also trained for a day & got output up to coco_resnet50_model.ckpt-80000.meta/index/data-00000-of-00001 files. Now how to use these files and test the model? I am looking for a test code which uses our trained model, takes an input image (provided by user) and writes segmented image as an output.

jsjs0827 commented 7 years ago

@gopi77 have you sloved the problem?Can you teach me how to do it?

gopi77 commented 7 years ago

Hi I tested with pre-trained resnet_v1_50.ckpt model, steps below.

  1. Follow the 6 steps under how to in https://github.com/CharlesShang/FastMaskRCNN a. Folder structure here, https://github.com/CharlesShang/FastMaskRCNN/tree/master/data, extract all the zip files b. you need to create the folder FastMaskRCNN/output/mask_rcnn

  2. At L.No 30 of FastMaskRCNN/unit_test.py change the code old: coco.read('./data/coco/records/coco_trainval2014_00000-of-00048.tfrecord') new: coco.read('./data/coco/records/coco_train2014_00000-of-00033.tfrecord')

  3. Enable the below code at L.No 34 to 38 of /FastMaskRCNN/libs/layers/crop.py if batch_inds is False: num_boxes = tf.shape(boxes)[0] batch_inds = tf.zeros([num_boxes], dtype=tf.int32, name='batch_inds') batch_inds = boxes[:, 0] * 0 batch_inds = tf.cast(batch_inds, tf.int32)

  4. change directory to FastMaskRCNN

  5. run this command-> python ./unit_test/data_test.py

  6. It dumps many png files at folder /FastMaskRCNN, they have only bounding boxes around the images, no segmented output.


  1. TBD: Modify the test code to create segmented output
  2. TBD: test with locally trained model (not clear how to test these trained files ) The intermediate trained files are a. coco_resnet50_model.ckpt-30000.meta b. coco_resnet50_model.ckpt-30000.index c. coco_resnet50_model.ckpt-30000.data-00000-of-00001 d. checkpoint
Sharathnasa commented 7 years ago

@gopi77 Thank you guiding. We were able to follow the steps mentioned by you and generate the bounding box and mask for the images, but not the name for the particular bounding box.

Could you please let me know on how to generate the annotations(ex: inside bounding box, if person is there it should display human on top of the bounding box ) for the particular bounding box as per the description in the paper.

Regards, Sharath

gopi77 commented 7 years ago

Hi Sharath

I am also looking help for the steps 7 & 8 mentioned in my previous comment (i think u asked the same)

anandbhattad commented 7 years ago

Did anyone successfully get segmented results?

HuangBo-Terraloupe commented 7 years ago

I was also able to generate the bounding boxes, but no segmentation and classification results.

duylp commented 7 years ago

@gopi77 I was able to generate the boxes thank to your instruction. About the segmentation, I found that the testing code did not post-process and save the masks so I write a few lines of code here, you can put it in the "data_test.py", after "Line 71: im.save(str(img_id_np) + '.png')":

mask = np.sum(gt_masks_np, axis=0, dtype='uint8') white_pos = np.where(mask > 0) mask[white_pos] = 255 mask_img = Image.fromarray(mask) maskimg.save('mask' + str(img_id_np) + '.png')

For simplicity, it will only create a binary mask for each image and save in a separated image. If you want to save the output all in one image then you have to work more than that.

anandbhattad commented 7 years ago

@ZweeLe How many iterations did you train your network for? Also, did you make any changes to hyperparameters?

chenzhuo1005 commented 7 years ago

@ZweeLe From my understanding, data_test.py is simply load the tfrecords generated from coco training datasets (with bbox, segmentation information encoded) and display this information. It is actually not using the model (no matter the pre-trained model or the manually trained model) at all. So I guess we should find a way to load the model into memory and load the test images. (Not sure if there is an api to load the image directly or we have to parse it to tfrecord) so that we can actually test the model performance.

duylp commented 6 years ago

@chen1005 Yes, I agreed. This was my mistake. I have found another way to test the trained model. You simply run the "train.py" with your trained model and uncomment lines from 284 to 291. However, there are only bounding boxes in those results.

chenzhuo1005 commented 6 years ago

@ZweeLe Actually I just add a method similar to draw_bbox called draw_mask in pil_util, and modified the train.py a little bit to call the new method. Now we can draw masks like the attachment. I comment the lines of printing classifications since windows is throwing some stupid exceptions. Image 3 is captured in my Ubuntu environment. So if you are using Linux, just uncomment the corresponding lines of printing classifications. Predict image: test_est_2

GT(Ground truth) image: test_gt_2

Captured from Linux with classes: 1

To achieve that, just simply replace the pil_utils.py by this: pil_utils.zip

And modify train.py:

total_loss = outputs['total_loss']
losses  = outputs['losses']
batch_info = outputs['batch_info']
regular_loss = tf.add_n(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))

input_image = end_points['input']
final_box = outputs['final_boxes']['box']
final_cls = outputs['final_boxes']['cls']
final_prob = outputs['final_boxes']['prob']
final_gt_cls = outputs['final_boxes']['gt_cls']
gt = outputs['gt']

# replace the draw_bbox
draw_mask(step, 
                  np.uint8((np.array(input_imagenp[0])/2.0+0.5)*255.0), 
                  name='est', 
                  bbox=final_boxnp,
                  mask=final_masknp,
                  label=final_clsnp, 
                  prob=final_probnp,
                  gt_label=np.argmax(np.asarray(final_gt_clsnp),axis=1),
                  )

draw_mask(step, 
                  np.uint8((np.array(input_imagenp[0])/2.0+0.5)*255.0), 
                  name='gt', 
                  bbox=gtnp[:,0:4],
                  mask=gt_masksnp,
                  label=np.asarray(gtnp[:,4], dtype=np.uint8),
                  )

Reference: Borrows code from @souryuu

anandbhattad commented 6 years ago

@chen1005 To achieve this do I need to retrain my network with above changes in train.py? Or my previous trained weights would be adequate to get these results? Also, can you answer my question here: https://github.com/CharlesShang/FastMaskRCNN/issues/26#issuecomment-317009968

Thank you!

chenzhuo1005 commented 6 years ago

@bhattad2 No, you don't have to retrain the network. Just simply load your previous trained network and do the online prediction and you will see the result. But be careful, you need to verify both the variables in resnet50_v1 network and pyramid networks are saved and restored. (Since the pretrained model contains only resnet50_v1 variables, the variables in pyramid networks are initialized randomly!) In order to verify, you can look at the restore() method and print all vars if the model is loaded by your trained model instead of the pretrained model.

I guess you are asking me to answer the question: How many iterations did you train your network for? Also, did you make any changes to hyperparameters?

In fact, I don't have any answers for the first part. Training in windows with only cpu is too slow and training in my Linux machine is better but I faced OOM error since I only have 4GB GPU. (Seems that it is a good time to buy a new computer. lol) So, the tricks I used is only use the above 2 images to train. And the above results are based on only 2,500 iterations. So I don't know how many iterations we need to completely train the coco dataset and predict on the evaluation set with fair results.

The second part of hyperparameters, I just changed the coco.py in line 98 tfrecords_filename, num_epochs=100) to tfrecords_filename, num_epochs=100000) since I only have 2 images in tfrecord, I need to increase the duplication of enqueue operation in order to train 2500 iterations. (Unless, 200 iterations I got exception since 2 * 100 = 200)

Hopefully it helps.

anandbhattad commented 6 years ago

@chen1005 Thanks for your reply. I have pretrained weights till 200k epochs which I had trained by randomly initializing pyramid networks. People in other issues of this repo have also shared pre-trained weights. How do I test the learned model and what script did you run to generate those results? Thanks much! -Anand

chenzhuo1005 commented 6 years ago

@bhattad2 I don't have any clever way to test it using complete separated test method. In fact, I just simply modified the train.py to remove some unnecessary steps. (solve() method and update_op for example, also the codes to save model since we don't want model to be saved during test stage).

You could modify that train.py file, and generate the tfrecord for eval datasets and point your train.py to eval tfrecord to test the performance.

chenzhuo1005 commented 6 years ago

@bhattad2 I need to cleanup my codes so that I can share my solution. Also can you share your 200k models with me. Please include 4 files (checkpoint, index, meta and 0000-0001 something). I tried to use the 190k and 500k models but they are broken. (Saying there is no pyramid parameters stored inside the model and restore failed)

anandbhattad commented 6 years ago

@chen1005 Oh, I had used someone else pretrained weights and trained it from there. I am not sure if they are broken as well. If they are, I will start retraining from scratch and share new weights. Cheers and thanks!

duylp commented 6 years ago

@chen1005 Thanks! Your work was interesting. I will wait until someone releases a pretrained model. It seems impossible for me to train the network with a 6GB GPU. Please let us know if you have any progress on this. Good luck!

lengly commented 6 years ago

@chen1005 I see your code in your comment, where did final_masknp & gt_masksnp come from?

chl916185 commented 6 years ago

You are using the trained model, the test results shown on the picture? @chen1005

w102060018w commented 6 years ago

So far I follow the @chen1005 suggestion, and could finally output testing results with the boxes, classification and instance segmentation. But somehow the result is not good, so I am still debugging to see where might be the problems come from. The pretrain model I used is locally trained model for 40000 iteration times. (get 4 files after the training process : checkpoint coco_resnet50_model.ckpt-40000.data-00000-of-00001 coco_resnet50_model.ckpt-40000.index coco_resnet50_model.ckpt-40000.meta)

screen shot 2017-08-29 at 5 09 08 pm screen shot 2017-08-29 at 5 09 49 pm

@lengly I think the final_masknp and gt_masksnp are just come from the computation result of _finalmask and _gtmasks respectively. you can add this line : final_mask = outputs['mask']['mask'] to get the _finalmask variable.(since the _gtmasks has already existed so we don't have to worry about it.) Then modify the code around line 267 in train.py into like the following to get final_masknp and gt_masksnp: s_, tot_loss, ... , final_masknp, ... , gt_masksnp, ... , tmp_4np= sess.run([update_op, ... + ...[final_mask] + ... + [gt_masks] + ... + [tmp_4]) The better segmentation result I've got so far is like this:

screen shot 2017-08-29 at 5 38 06 pm

duylp commented 6 years ago

@w102060018w in the paper, they trained the model with 160k iterations. I think you need to train it much more (at least for now, the boxes and the masks are partly on the objects).

w102060018w commented 6 years ago

@duylp thanks for you advice, but since I ran it on the GCP with tesla k80, so it will still take lots of time if I run for 160K iterations. I hope somebody can release their pre-train model. I have also try the pre-train model for 499999 iterations which are provided somewhere in the open issue, but when loading the pre-train parameters, it throws out some errors.

y-dep commented 6 years ago

@w102060018w hi, can I have a look at your test code?

onlytailei commented 6 years ago

@w102060018w Would you mind explaining the final_masknp and gt_masknp more clearly? I added the final mask extracted from outputs

    final_mask = outputs['mask']['mask']

And I revised the code like this

        s_, tot_loss, reg_lossnp, img_id_str, \
        rpn_box_loss, rpn_cls_loss, refined_box_loss, refined_cls_loss, mask_loss, \
        gt_boxesnp, \
        rpn_batch_pos, rpn_batch, refine_batch_pos, refine_batch, mask_batch_pos, mask_batch, \
        input_imagenp, final_masknp, final_boxnp, final_clsnp, final_probnp, final_gt_clsnp, gtnp,gt_masksnp, tmp_0np, tmp_1np, tmp_2np, tmp_3np, tmp_4np= \
                     sess.run([update_op, total_loss, regular_loss, img_id] + 
                              losses + 
                              [gt_boxes] + 
                              batch_info + 
                              [input_image] + [final_mask] + [final_box] +  [final_cls] + [final_prob] + [final_gt_cls] + [gt] + [gt_masks] + [tmp_0] + [tmp_1] + [tmp_2] + [tmp_3] + [tmp_4]

But the predicted mask is still missing. I trained the model 80k iterations.

onlytailei commented 6 years ago

@w102060018w I just found that correct bboxes with masks are shown on the image. The yellow bboxes are mismatch results.

kxhit commented 6 years ago

@MarkMoHR Hi! I followed your suggestion! But I got an error while running the code forward_test_single_image.py. Could you help me figure this out? Thanks a lot!!! I got this error below: `[] [] P4 P3 P2 P5 anchor_scales = [8, 16, 32] anchor_scales = [4, 8, 16] anchor_scales = [2, 4, 8] anchor_scales = [1, 2, 4] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 455, in _apply_op_helper as_ref=input_arg.is_ref) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 991, in internal_convert_n_to_tensor ctx=ctx)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 208, in constant value, dtype=dtype, shape=shape, verify_shape=verify_shape)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 371, in make_tensor_proto raise ValueError("None values not supported.") ValueError: None values not supported.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./forward_test/forward_test_single_image.py", line 239, in forward_test_single_image() File "./forward_test/forward_test_single_image.py", line 157, in forward_test_single_image loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0]) File "./forward_test/../libs/nets/pyramid_network.py", line 580, in build is_training=is_training, gt_boxes=gt_boxes) File "./forward_test/../libs/nets/pyramid_network.py", line 253, in build_heads sample_rpn_outputs_with_gt(rois, rpn_probs[:, 1], gt_boxes, is_training=is_training) File "./forward_test/../libs/layers/wrapper.py", line 132, in sample_with_gt_wrapper [tf.float32, tf.float32, tf.int32, tf.float32, tf.float32, tf.int32]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 212, in py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 50, in _py_func "PyFunc", input=input, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 485, in _apply_op_helper raise TypeError("%s that are invalid." % prefix) TypeError: Tensors in list passed to 'input' of 'PyFunc' Op have types [float32, float32, , bool] that are invalid.`

hakutyou commented 5 years ago

@MarkMoHR

Hi! Do you delete this repo, or move anywhere else?