ethnhe / raster_triangle

A simple renderer with z-buffer for synthesis data generating.
56 stars 22 forks source link

Multiclass Training #2

Closed saadehmd closed 3 years ago

saadehmd commented 4 years ago

Can this network be trained for simultaneous multi-class segmentation/estimation? Are the fuse images for this purpose, or are they still just training for one of the many fused objects ?

ethnhe commented 4 years ago

Yes, if all the target classes on each frame are labeled in the training set, for example, the YCB-Video dataset. While in the LineMOD dataset, the fuse images are just training for one of the many fused objects, because the provided training set only provides the GT label of one object, which may confuse the network.

saadehmd commented 4 years ago

Thanks. does this multiclass training have a different training script or netqork config, since the output tensor dimensions might be different with multiclass labels. Also, how would the binary mask look like in multi-class case? It can't be just black(0) for backgorund and white(255) for all other objects.

ethnhe commented 4 years ago

For the network config, you only need to change the self.n_objects and self.n_classes in common.py to number of classes in your own dataset. Masks in multi-class cases are not binary, each object will have an ID. You can view the datasets/ycb/ycb_dataset.py or the mask labels in the YCB-Video dataset for more details.

saadehmd commented 4 years ago

Thanks a lot for your help again. Just a last thing. I am trying to create my own dataset by following the LINEMOD method. in gt.yml what are cam_R_m2c snd cam_t_m2c
and there is obj_bb. i am guessing these are bounding box coordinates. Does your network actually use 'object_bb' coordinates ? Since from the paper, i understood that bounding box corners are not v good keypoints so you detect your own keypoints on object surface. If i make my own gt.yml can i leave this 'object_bb' out?

ethnhe commented 4 years ago

cam_R_m2c means the rotation matrix to transform the object coordinate to the camera coordinate and cam_T_m2c the translation offset. We didn't use the 'object_bb' coordinates, you can leave it out.

saadehmd commented 4 years ago

Great thanks!! I'll leave this open for a while. and close after a few trials with this :)

saadehmd commented 4 years ago

We didn't use the 'object_bb' coordinates, you can leave it out.

Sorry about the cam_R_m2c, Are the cam_to_world coordinates ?

ethnhe commented 4 years ago

No, I think m2c means mesh(object) to camera.

saadehmd commented 4 years ago

How did you decide the train/test split? If i sample uniformly in upper hemisphere around the object, should the training samples be far apart from each other, in order to reduce bias?

ethnhe commented 4 years ago

We split the train/test set from the real dataset following previous works. See our paper for the details. All synthesis data are used for the train set. For your own dataset, I think randomly sample from the whole dataset or pick some videos from all videos are good choices.