Custom object detection fails even if using a training image as input

andrei91ro commented 2 years ago

Hello,

First of all congratulations for your hard work on DOPE and NDDS!

I used NDDS to generate a dataset of 20K images by following the instructions from the wiki. The dataset looks good even when loaded using nvdu_viz

Since I don't yet have the real object assembled (a Crazyflie drone with a QR code beneath), I tried to point the webcam towads my monitor where one of the training images was displayed. Nothing happend unfortunately.

Afterwards, I tried using the ROS image_publisher node to stream a single image as a webcam stream and thus avoid the occasional noise that was caused by the monitor refresh rate. I also published the camera_info topic along with the image_raw.

However, even when provided with a clear static image, the model does not detect the custom object:

At this point, I don't know the cause of my problems so any expert advice on this matter is highly appreciated :)

My possible theories are that:

The object is symetrical and that is why it not detected - I did read issue #37 but my object is not that round and the camera orientation is limited
There is some issue regarding camera intrinsecs that differ between the NDDS training camera view, my Logitech C922 webcam and the syntetic image published by ros image_publisher
Insuficiently varied dataset.

If anyone wants to try out the dataset I can provide it. Since this work is part of a research project, I intend to write a white paper on the exact training steps and hiccups along the way and publish it on Github along with all of the required training data.

sejmoonwei commented 2 years ago

I got same problems ,maybe we can talk about the following things.

Do you use your camera's intrinsics instead of the intrinsics in camera_setting.json in ndds dataset ? I got it from my D435i camera and used for inference.It looks better but still not good.
What do you do when generate different poses of your object? I use random rotation/movement , also the background and light in UE4 scene.
I found that training on a symmetrical object will cause the loss to stop at a high value ,but DOPE will still give a feasible result when inferencing which may looks right.
As the former issue said, nvisii works better than NDDS. Now I'm working on nvisii to see if it works.

andrei91ro commented 2 years ago

Hello @sejmoonwei , sorry for the late response.

1) Unfortunately I did not use any camera intrinsics file neither for NDDS or the live prediction. Of course I will have to try that out too.

2) Exactly the same

3) I will try out, possibly limiting the rotation angle ot the camera around the object.

4) I will try it out as indeed I am wrapping my head around UnrealEngine. As an alternative, I am currently trying out BlenderProc for generating syntetic data using Python and Blender. The only downside is that you have to handle the format conversions as it does not output into the native format used by DOPE.

sejmoonwei commented 2 years ago

well , do you get correct results on your train image? Or fail to detect only on real scene? I recommand you don't use ros yet. , Just a inference script will simplify this.I can share one if you need.

andrei91ro commented 2 years ago

No, not even on training images. I would be grateful if you can provide me with such a script as it would allow me to iterate faster in the training/testing process. I will also try using a single Apriltag in the square underneath the drone instead of the current four. From other tests using Yolov4 I noticed that the detector is quite undecided if presented with fiducial tags which differ far less than a human/dog classes.

sejmoonwei commented 2 years ago

This is the one I currently use for inferencing . It shows the belief map and cuboid of the object. Input as a single image.Please resize your image to 640x480 before input or it may raise a runtime error. https://github.com/sejmoonwei/inference_on_img/blob/main/belief_maps.py

andrei91ro commented 2 years ago

Thank you for the link! I will try it sometime during the winter holidays as for now I am trapped in bureaucratic work. In order to return the favor, I can recommend my personal backup solution which is an integration of a ZED camera and a custom trained Yolo V4 (yolo_integration) bounding box detector (Custom_3D_detection_and_tracking).

The camera SDK is capable of estimating the 3D position of the detected object and there is a Python API available for further processing of real time data.

Of course, DOPE is the better alternative as it does not depend of RGB-D images but the ZED camera is also an alternative. I think others have achieved the same thing using Intel RGB-D cameras.

sejmoonwei commented 2 years ago

I‘ll read them , Thanks. Just keep touch on this (Custom object detection , DOPE).

NVlabs / Deep_Object_Pose

Custom object detection fails even if using a training image as input #194