Open mbajlo opened 4 years ago
Thank you for the detailed information. It sounds like things should work out of the box, but they are not. I would say that the example with hammer you share, the hammer is too close to the camera, you can only see a 2 keypoints. Is your data all looking like this one. I have noticed that if the keypoints are not all seen by the image, the architecture is hard to train to produce valid detection. Maybe try with a smaller hammer appearance .
It looks like the training data is empty as well, I am not sure why. You could check in the dataloader if the belief maps are correct at that level.
Hello @TontonTremblay , thank for the reply The image I have showed is just one of the many in the sequence. For example this is how the image 000120 looks like: I am changing the distance and the angle.
Also I have tried to use the option Auto Start Capturing
in the SceneCapturer_Simple
and then it start from the center of the object's coordinate system. I have also tried to create a dataset without the Auto Start Capturing
option, then I have to start capturing by myself but the situation is the same.
I am using the OrbitalMovement
to change the Yaw angle and I was thinking to use the network trained on the Yaw angle to retrain later with the Pitch angle, because I couldn't find a way to change both angles at the same time. If I want to change both values in the NDDS it looks like the Yaw angle has a priority.
When you say 2 keypoints, what do you mean?
In the video data_overview.mp4
it says (around 20sec) if the visibility tag is 0
is fully visible and if it is 1
then it is fully invisible, on all my .json
files I have value 1
(fully invisible), how can I change it to be 0
? Could this cause the empty training data?
I tried to debug the train.py
to check training data
, but I can see the path given in the --data
variable is used and it recognizes the images
and belonging .json
Do you see something wrong here?
It seems it populated the trainingdata just fine for me. The same is worth for the testdata
This is how the output looks like when I start training:
start: 08:34:17.726571
load data
training data: 4 batches
testing data: 1 batches
So this line printed the line training data: 4 batches
which means the trainingdata is not none
at least.
Is there an explanation how to show belief maps
at the dataloader
level? Could you help me with that?
Thanks
Something wrong is going on, I am not sure what. How do you call the training script?
hello @TontonTremblay ,
Thanks for the reply, the training script name is the same, I kept it as train.py
. Do you think the name could make a difference?
How do you call it. Like python train.py--data.... ? (Sorry phone answer)
On Tue, Nov 24, 2020 at 09:19 mbajlo notifications@github.com wrote:
hello @TontonTremblay https://github.com/TontonTremblay ,
Thanks for the reply, the training script name is the same, I kept it as train.py. Do you think the name could make a difference?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/151#issuecomment-733121337, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIAV62NTBXSF2WPUPCTSRPTLXANCNFSM4T55POFQ .
Hello @TontonTremblay
You meant how do I execute the training script, I do it like this from the folder where the train.py
is:
python train.py --data C:\Users\m\Desktop\ML\ThorHammer_387mm_dataset\train\yaw\thorHammer_YawAngle\ --batchsize 50 --pretrained False --datatest C:\Users\m\Desktop\ML\ThorHammer_387mm_dataset\test\yaw\thorHammer_YawAngle\ --imagesize 400 --object thorHammer_387 --workers 8 --outf markoThorHammer_testBeliefExport
Or if do the debug in the VS code I just execute the launch configuration with the same args
given before, but it is basically the same thing.
`
In your dataset do you only have the hammer? If so remove the --object in the python.
On Tue, Nov 24, 2020 at 10:50 mbajlo notifications@github.com wrote:
Hello @TontonTremblay https://github.com/TontonTremblay
You meant how do I execute the training script, I do it like this from the folder where the train.py is:
python train.py --data C:\Users\m\Desktop\ML\ThorHammer_387mm_dataset\train\yaw\thorHammer_YawAngle\ --batchsize 50 --pretrained False --datatest C:\Users\m\Desktop\ML\ThorHammer_387mm_dataset\test\yaw\thorHammer_YawAngle\ --imagesize 400 --object thorHammer_387 --workers 8 --outf markoThorHammer_testBeliefExport
Or if do the debug in the VS code I just execute the launch configuration with the same args given before, but it is basically the same thing. `
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/151#issuecomment-733167707, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JICFJSZMHQSCF72BOX3SRP575ANCNFSM4T55POFQ .
Yes that's right, I have only the hammer. I will remove the --object
and try again.
I will let you know what happened.
Thanks
Hello @TontonTremblay
I have retrained the network again, the target belief maps
are now looking like this:
So I would say it looks pretty good.
But (there is always a but :D ) after I try to test my network on the images from the train
or test
datasets or if I make a new image of the hammer, the results list
is always empty and the belief maps
are looking something like this:
I tried to train the network with more than one object in the image, or by changing the yaw angle or the pitch angle. The target belief maps
are looking always getting back as in the first image, but I cannot detect an object on the image from the train
or test
dataset and the belief maps
are looking strange like above.
Do you know what could be wrong, what should be my next step?
On the phone again.
I think you need to train for longer. I normally train for 60 epochs on a dataset of 120k images and with a batch size of 128 on 4 p100.
On Fri, Nov 27, 2020 at 13:13 mbajlo notifications@github.com wrote:
Hello @TontonTremblay https://github.com/TontonTremblay
I have retrained the network again, the target belief maps are now looking like this:
[image: image] https://user-images.githubusercontent.com/15787947/100484113-6dae5c00-30fb-11eb-85c0-473a7561edcb.png
So I would say it looks pretty good.
But (there is always a but :D ) after I try to test my network on the images from the train or test datasets or if I make a new image of the hammer, the results list is always empty and the belief maps are looking something like this:
[image: image] https://user-images.githubusercontent.com/15787947/100484555-3476eb80-30fd-11eb-8290-0b0b318dee8b.png
I tried to train the network with more than one object in the image, or by changing the yaw angle or the pitch angle. The target belief maps are looking always getting back as in the first image, but I cannot detect an object on the image from the train or test dataset and the belief maps are looking strange like above.
Do you know what could be wrong, what should be my next step?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/151#issuecomment-734991010, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIBPMVCRP2RVCKKQFDTSSAJAPANCNFSM4T55POFQ .
Hello @TontonTremblay
I thought you are using the NVidia GPUs, but I have the a Radeon GPU so I can train only with my CPU. Just for testing I have made a training dataset of 100 images of the hammer (and the test dataset of 20 images), all images have been taken from the same position of the camera (there are no transport, no movement and not angle change), when I train the network for 60 epochs I get this response when I test the network:
list of found objects: []
Do you think the problem could be in visibility
, should it be 0 for full visibility?
The image size is 400x400
, should I make it 800x800
or more?
{
"camera_data":
{
"location_worldframe": [ -990, -80, 21.502099990844727 ],
"quaternion_xyzw_worldframe": [ 0, 0, 0, 1 ]
},
"objects": [
{
"class": "thorHammer_387",
"instance_id": 16777200,
"visibility": 1,
"location": [ 0, -4.4168000221252441, 49.195098876953125 ],
"quaternion_xyzw": [ 0, -0.70709997415542603, 0, 0.70709997415542603 ],
"pose_transform": [
[ -1, 0, 0, 0 ],
[ 0, 0, 1, 0 ],
[ 0, -1, 0, 0 ],
[ 0, -4.4168000221252441, 49.195098876953125, 1 ]
],
"cuboid_centroid": [ -0.087600000202655792, 4.3204998970031738, 48.991600036621094 ],
"projected_cuboid_centroid": [ 199.64230346679688, 217.63749694824219 ],
"bounding_box":
{
"top_left": [ 134.22529602050781, 169.74690246582031 ],
"bottom_right": [ 296.0574951171875, 229.5238037109375 ]
},
"cuboid": [
[ -6.8632998466491699, -15.043600082397461, 53.087600708007813 ],
[ -6.8632001876831055, -15.043600082397461, 44.895599365234375 ],
[ -6.8632001876831055, 23.684499740600586, 44.895599365234375 ],
[ -6.8632998466491699, 23.684499740600586, 53.087600708007813 ],
[ 6.6880002021789551, -15.043600082397461, 53.087699890136719 ],
[ 6.6880002021789551, -15.043600082397461, 44.895599365234375 ],
[ 6.6880002021789551, 23.684499740600586, 44.895599365234375 ],
[ 6.6880002021789551, 23.684499740600586, 53.087699890136719 ]
],
"projected_cuboid": [
[ 174.14370727539063, 143.32530212402344 ],
[ 169.42579650878906, 132.98399353027344 ],
[ 169.42579650878906, 305.50930786132813 ],
[ 174.14370727539063, 289.22799682617188 ],
[ 225.19599914550781, 143.32539367675781 ],
[ 229.79359436035156, 132.98399353027344 ],
[ 229.79359436035156, 305.50930786132813 ],
[ 225.19599914550781, 289.2279052734375 ]
]
}
]
}
Or should I just make a dataset with more than 1000 images and then train the network for 60 epochs?
Could it be the losses of the epoch 60 are still too high?
test loss 60, 0,0.007053715176880
train loss 60, 0,0.007159050088376
How low the loss should be?
Well I am not sure what to day here, you need a fairly large dataset 5k to 60k and you need to train for 2-3 days. I am sorry I am not sure CPU train will work...
Hello all, I have read a lot of topics here, it got me pretty far, but now I am stuck. I am trying to make train the network on my custom object which is made using the blender, exported and imported in the NDDS as fbx file. In the NDDS I have made the train dataset of 200 images and the test dataset of 50 images. the
_camera_settings.json
looks like this:the
_object_settings
looks like this:the
000019.json
(the other ones are quite similar) for example looks like this:the
000019.png
looks like this:I have run the
nvdu_viz
tool on the dataset and it looks just fine, for example this is the image000019.png
:my
header.txt
file looks like this:Namespace(batchsize=50, data='C:\\Users\\m\\Desktop\\ML\\ThorHammer_387mm_dataset\\train\\yaw\\thorHammer_YawAngle\\', datasize=None, datatest='C:\\Users\\m\\Desktop\\ML\\ThorHammer_387mm_dataset\\test\\yaw\\thorHammer_YawAngle\\', epochs=60, gpuids=[0], imagesize=400, loginterval=100, lr=0.0001, manualseed=3543, namefile='epoch', nbupdates=None, net='', noise=2.0, object='thorHammer_387', option='default', outf='train_ThorHammer_testBeliefExport', pretrained=False, save=False, sigma=4, workers=8)seed: 3543
I have made 21 epochs. At the epoch 1 the
train loss
has been0.032777167856693
and at the epoch 21 thetrain loss
is0.000019518713088
, thetest loss
is similiar, it goes from0.009614795446396
at the epoch 1 to0.000018934417312
at the epoch 21.After the training I decided to test my network on the images I have used for training and testing. But this is where the things began to look strange. I am not getting the detection at all.
Then I added this code to the
ObjectDetector class
and used it to to see thebelief maps
I have used it on the jpg image 000019.jpg in separate script like this:
And the result I am getting back doesn't look good at all:
So I tried to figure out what went wrong, because the error looks quites low, but it seems it cannot detect the object on the training/test images. I followed this topic which says I need to generate belief maps on training.
So I tried to train the network again and save the Variable tensors like this:
this way I have the tensors of the variables and I could try to show them later, which I did like this, I am not sure if it correct?
But then I have seen that the targetBeliefMaps are all the same no matter which epoch tensor I try to read and I would say these are not looking good.
I know I am missing something big and I am not sure if I am showing the target belief maps from training correctly, can you help or point me in the right direction?
Thanks, any help is appreciated.