furushchev / jsk_semantics_201607

2 stars 2 forks source link

Some problems of raw data collected with rosbag #3

Closed wkentaro closed 8 years ago

wkentaro commented 8 years ago

Problems

It seems easy to segment very similar images at testing stage after training, so this makes it difficult to evaluate the model precisely.

Possible solutions

  1. See the world coords of camera link in rosbag and save data only it is changed.
  2. Collect data online after some moving.

Example

https://drive.google.com/drive/u/1/folders/0B9P1L--7Wd2vdDZYVlVzZF9RbDA

Wrongly labeled data because of PR2's moving

Possible solutions

  1. See the world velocity of camera link in rosbag and save data only it is changed.
  2. Collect data online after some moving.

Wrongly labeled data because of occlusion

@furushchev Possible to remove labeling in the region of occlusion?

wkentaro commented 8 years ago

@furushchev Let's discuss about these problems whether they are solvable with your stage (chapter3) or my stage (chapter4).

furushchev commented 8 years ago

@wkentaro Thanks for analyzing image data.

Here is my idea.

wkentaro commented 8 years ago

(As workaround) avoid to use similar class

What is class? The reason why there are very similar images is the publish-mask-image.l does publish mask image without considering the move of camera link. It publishes as many as possible.

How about filtering by whether the velocity of robot is under a threshold?

How can I do that? Which topic should I take and what the velocity? I suppose you mean the velocity of camera_link.

Actually does it matter? I think in the situation of local detection, it does not matter even if another object is in masked images, because images are also supposed to be occluded in running (not learning) phase.

I think it does matter, because with current approach, the mask is generated even if the target object is completely occluded, isn't it? For example, if PR2 looks at the direction of refrigerator from my desk at data collection status, he thinks the Kiva Shelf is the refrigerator, doesn't he?

publishing mask image without occluding region needs higher computational cost because all objects between robot and target object need to be rendered, I think.

Maybe you can use point cloud that is published by Kinect sensor. There are two approaches:

  1. Publish euslisp model, and generate mask of region where the point cloud and euslisp model overlap (maybe with C++ robot self filter node).
  2. Subscribe point cloud and generate mask of region where the point cloud and euslisp mdoel overlap (in euslisp node).
furushchev commented 8 years ago

What is class? The reason why there are very similar images is the publish-mask-image.l does publish mask image without considering the move of camera link. It publishes as many as possible.

Oh, sorry i misunderstood. Well, what is evaluate you mean?

How can I do that? Which topic should I take and what the velocity? I suppose you mean the velocity of camera_link.

we can use tf from /world to /head_mount_kinect_rgb_optical_frame.

I think it does matter, because with current approach, the mask is generated even if the target object is completely occluded, isn't it? For example, if PR2 looks at the direction of refrigerator from my desk at data collection status, he thinks the Kiva Shelf is the refrigerator, doesn't he?

You are correct.

wkentaro commented 8 years ago

Oh, sorry i misunderstood. Well, what is evaluate you mean?

I mean the accuracy in train/test status for deep neural network. If there are many very similar images

  1. We cannot use random to split dataset to train and test one for evaluation of the segmentation network model. (because if similar images are mixed in train and test dataset, the result of evaluation is unreliable).
  2. The mean value of accuracy becomes inaccurate.

we can use tf from /world to /head_mount_kinect_rgb_optical_frame.

I see. That is possible.

You are correct. With euslisp, it is easy to compute how many points from point clouds are in the model surface, which I think we can use the model is occluded or not. Another possible approach is to use object consistency. e.g. compare image and model color ambiguously, or compare expected duplicated region of images from other directions.

Why not RobotToMaskImage? Oops, the RobotToMaskImage does nothing but convert robot to mask image without consideration of point cloud. But maybe you can use below pipeline:

euslisp model -> robot_description -> robot_self_filter -> filtered point cloud -> OrganizedPointCloudToPointIndices -> PointIndicesToMaskImage.

furushchev commented 8 years ago

I mean the accuracy in train/test status for deep neural network.

I see. How about filtering with time and absolute position of camera link from world?

wkentaro commented 8 years ago

I see. How about filtering with time and absolute position of camera link from world?

Yeah, I will do that.

furushchev commented 8 years ago

euslisp model -> robot_description -> robot_self_filter -> filtered point cloud -> OrganizedPointCloudToPointIndices -> PointIndicesToMaskImage.

I'm now working on this.

maybe actual route will be

eus model -> bounding box -> attention clipper -> filtered cloud -> indices -> mask

compiling latest self_filter and jsk_pcl_ros on hydro is really tough...

wkentaro commented 8 years ago

compiling latest self_filter

I think @mmurooka uses it and he can help you.

jsk_pcl_ros on hydro is really tough...

I'm wondering why, because the travis testing has passed. https://travis-ci.org/jsk-ros-pkg/jsk_recognition/builds/145303717

wkentaro commented 8 years ago

Closed via #7