Closed yongxf closed 4 years ago
Thank you! I hope you find it useful. We mainly used Matterport3D and ScanNet to improve performance on surfaces like walls and tabletops. Unfortunately, our synthetic dataset contains only transparent objects, so we could not train on mixed non-transparent objects + transparent objects. The model is trained on synthetic data only. Our next dataset will fix this issue.
Yes, there is a high percentage of false positives in the masks. However, we focussed on reconstruction of transparent surfaces only, so it wasn't an issue. The model is also able to correctly reconstruct low-frequency non-transparent surfaces (walls, etc) fairly well.
If this is a problem for you, I suggest further training the segmentation model on other datasets involving transparent objects. Example: Segmenting Transparent Objects in the Wild, TOM_Net, ProLIT Light-field Dataset, Single-Shot Analysis of Refractive Shape Using CNNs. You could also try fine-tuning on the real val/test images if necessary - they include pairs of images with transparent objects and opaque, spray-painted replacements.
Thanks a lot for your recommendation. The database you recommended looks very promising. I will also fine-tune the model with your validation and test dataset. I assume the fine-tuning process will keep using the 256 x 256 image sizes setting. An interesting thing I found is that when I use the 1280 x 720 resolution RGB image, the output of the object segmentation network can be very different with 256x256 setting (e.g. whole cup is segmented when resizing the input image to 256x256, while only the thick bottom is segmented and the remaining cup body is marked as non-transparent when using 1280 x 720). Can you shed some light on this behavior? The object is about 0.7m away from Intel Realsense D415, and the glass cup is a transparent normal glass cup. Thanks again for your suggestions!
A few images would help.
when I use the 1280 x 720 resolution RGB image
Our network is trained on 256x256p resolution input images that are resized from 512x288p (16:9 aspect ratio). The 1280x720 input is 16:9, so if you resize it to 256x256, it should work fine. Trying to run inference on any other image size will not give good results, since the model was not trained on other resolution images. Models like learn features that are dependent on the size of the input image.
The object is about 0.7m away from Intel Realsense D415, and the glass cup is a transparent normal glass cup.
That sounds about right. What is the background like? The models don't perform well in backgrounds with noisy patterns, or when the transparent objects are partially occluding other objects. Some images would help me identify what's going on.
I will also fine-tune the model with your validation and test dataset.
Do remember, the metrics against those datasets will improve after fine-tuning. An additional idea I had is to try a form of data-augmentation: Overlay cut-out images of other objects on top of the synthetic images and generate masks accordingly. I don't know how well it'll perform, but should help improve the performance somewhat.
PS: Deleted earlier comment, it had wrong info. In case you got an email with it.
@yongxf It's been a while since the issue was raised. Do you have any more questions?
Sorry, Forgot to close it. Thanks a lot for your explanation!
Thank you for introducing your excellent work and database. I notice that you pre-trained your normal estimation module using additional Matterport3D and ScanNet datasets in order to enhance the performance in non-transparent objects. Did you also train the segmentation masks with mixed non-transparent objects + transparent objects? I am asking this because I found some false positive masks produced on non-transparent objects when I try your transparent object segmentation module. Looking forward to your reply! Thanks!