ethz-asl / vgn

Real-time 6 DOF grasp detection in clutter.
BSD 3-Clause "New" or "Revised" License
257 stars 55 forks source link

Do the pile and packed scenes need to be trained separately? #21

Closed Daiqy closed 1 year ago

Daiqy commented 2 years ago

Hi there, I notice that there are two types of scene, "pile" and "packed", as mentioned in the paper. Do the pile and packed scenes data need to be trained separately to obtain two different VGN models? Or is it a mix of data from these two scenes ("2 million grasps" as mentioned in the paper) to train a full VGN model? Looking forward to your reply. Thank you!

aniketghodake10 commented 2 years ago

Hi there, I notice that there are two types of scene, "pile" and "packed", as mentioned in the paper. Do the pile and packed scenes data need to be trained separately to obtain two different VGN models? Or is it a mix of data from these two scenes ("2 million grasps" as mentioned in the paper) to train a full VGN model? Looking forward to your reply. Thank you!

I guess the pre-trained model is trained on combined pile+packed dataset

mbreyer commented 2 years ago

Yes, I split the training data roughly half-half between the two types of scenes.