NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.03k stars 287 forks source link

Learning custom data always 0% #321

Open aled96 opened 1 year ago

aled96 commented 1 year ago

I have defined an object, as the Ketchup one. I have a generated 1000 images with the following command:

python single_video_pybullet.py --nb_frames 1000 --scale 0.015 --path_single_obj ~/Deep_Object_Pose/scripts/nvisii_data_gen/models/iros_block/google_16k/textured_simple.obj --nb_distractors 0 --nb_object 5

And I was able to obtain the 1000 images with n object like this: 00962

Then, I tried to use the train with the aforementioned set of images with:

python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 25 --batchsize 10 --outf tmp/ --data ../nvisii_data_gen/output/output_example/

I tried with different epochs, batchsize and generating more times the set of images, howver I obtain always 0% for each epoch:

Screenshot from 2023-09-29 18-34-25

I am kind of new with learning so I do not know in deep the details, what I am doing wrong ? In the csv files inside output folder, there are no data, only the header. In addition, I add the flag --save I have no results.

Thank you !

TontonTremblay commented 1 year ago

Let it train to epoch 100. And also check the output on in the tensorboard.

tensorboard --logdir /path/to/experiment/

Then you open chrome/firefox to the localhost and check the image tab. Check some other issues here to see what sort of output you should get.

aled96 commented 1 year ago

I tried to do it, however, I still have 0% for each epoch.

I also tried to use a reduced dataset of 5 images.

From tensorboard I get the following info:

The second epoch is the following: image

After more than 50 epochs I have:

image

image

TontonTremblay commented 1 year ago

lower the learning rate a tad. The 0% is about the data it loads, not the perf. Sorry. I should update this. Can you try on a single image? Normally I test this first.

aled96 commented 1 year ago

I did a test with lr=0.00001, one image only, 100 epochs and batch size to 2.

Results in the end: image image

TontonTremblay commented 1 year ago

The train belief guess should look like the gt above it, can you run it for longer. run it for like 1000 epochs.

aled96 commented 1 year ago

I have changed the background of the input image and added symmetry information and trained on the following image:

00002

I run it for 1000 epochs and in the end the result was the following:

image image

It seems much better! Do you think that now I can train on a bigger dataset with more instances of objects/distractors?

TontonTremblay commented 1 year ago

Are you aware of the symmetries in your object? Check the generating data with symmetries. But yeah this looks good now. DOPE takes a while to train, so you will have to patient, like on a 60k image dataset I train for ~30 epochs.

aled96 commented 1 year ago

I will adjust everything and try to run with more images if the PC allows me. Thank you!