Open sebastian-ruiz opened 3 years ago
Hello, so sorry your model does not work well. There was a similar discussion about a chair detection, https://github.com/NVlabs/Deep_Object_Pose/issues/137. If you could visualize the the belief maps, that might help us understand what the network sees. The loss looks very low, what is the loss at epoch 1-2-3? Could you share a json file from training data.
I try and get the belief maps using the following code snippet:
for j in range(vertex2.size()[0]):
belief = vertex2[j].clone()
belief -= float(torch.min(belief).data.cpu().numpy())
belief /= float(torch.max(belief).data.cpu().numpy())
belief = torch.clamp(belief, 0, 1)
belief = torch.cat([belief.unsqueeze(0), belief.unsqueeze(0), belief.unsqueeze(0)]).unsqueeze(0)
temp = Variable(belief.clone())
array_belief = temp.data.squeeze().cpu().numpy().transpose(1, 2, 0) * 255
cv2.imshow('belief_' + str(j), array_belief)
where vertex2
is the variable from the detect_object_in_image(...)
function in detector.py
(see my inference.py script for more info).
When I try and get the belief maps of the soup cans I get this:
but when I try the same with my object I get this (this image is from the training set. Using the webcam I also get the same belief maps and no box):
My _object_settings.json
file looks like this:
{
"exported_object_classes": [
"KALO_K1_5_edited"
],
"exported_objects": [
{
"class": "KALO_K1_5_edited",
"segmentation_class_id": 144,
"segmentation_instance_id": 16014600,
"fixed_model_transform": [
[ 0, 0, 1, 0 ],
[ -1, 0, 0, 0 ],
[ 0, -1, 0, 0 ],
[ -0.55800002813339233, -0.25029999017715454, 0.22859999537467957, 1 ]
],
"cuboid_dimensions": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ]
}
]
}
I noticed that my cuboid_dimensions are really in meters, where my object is 119m long even though it should be 11.9cm long. I have therefore set the config_pose.yaml file to the following, so that it corresponds to the size of the object in the training data:
dimensions: {
"KALO_K1_5_edited": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ],
...
}
mesh_scales: {
"KALO_K1_5_edited": 1.0,
...
}
The losses at the end of each epoch 1, 2, 3, 4, 5 from loss_train.csv
are:
1, 624,0.000003241889772
2, 624,0.000002674767757
3, 624,0.000000930202077
4, 624,0.000000722245716
5, 624,0.000001037328047
...
15, 624,0.000000092100692
...
30, 624,0.000000032321594
Your code to display the belief maps is not correct. DOPE has two outputs, you are using the vertex map, the one that is pointing toward the center of the object.
Here is my code I am currently using (sorry I need to update this repo):
@staticmethod
def detect_object_in_image(net_model, pnp_solver, in_img, config,
grid_belief_debug = False, norm_belief=True,run_sampling=False,network='dope'):
''' Detect objects in a image using a specific trained network model
Returns the poses of the objects and the belief maps
'''
if in_img is None:
return []
if network == 'full':
scale_factor = 1
OFFSET_DUE_TO_UPSAMPLING = 0
else: # 'dope' and 'mobile'
scale_factor = 8
# OFFSET_DUE_TO_UPSAMPLING = 0.4395
OFFSET_DUE_TO_UPSAMPLING = 0
# print("detect_object_in_image - image shape: {}".format(in_img.shape))
# Run network inference
# print(in_img.shape)
image_tensor = transform(in_img)
image_torch = Variable(image_tensor).cuda().unsqueeze(0)
# print(image_torch.shape)
out, seg = net_model(image_torch) # run inference using the network (calls 'forward' method)
vertex2 = out[-1][0]
aff = seg[-1][0]
# Find objects from network output
detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config,
run_sampling=run_sampling,
scale_factor = scale_factor,
OFFSET_DUE_TO_UPSAMPLING = OFFSET_DUE_TO_UPSAMPLING)
if not grid_belief_debug:
return detected_objects, None
else:
# Run the belief maps debug display on the beliefmaps
upsampling = nn.UpsamplingNearest2d(scale_factor=scale_factor)
tensor = vertex2
belief_imgs = []
in_img = (torch.tensor(in_img).float()/255.0)
in_img *= 0.7
for j in range(tensor.size()[0]):
belief = tensor[j].clone()
if norm_belief:
belief -= float(torch.min(belief).item())
belief /= float(torch.max(belief).item())
# print (image_torch.size())
# raise()
# belief *= 0.5
# print(in_img.size())
belief = upsampling(belief.unsqueeze(0).unsqueeze(0)).squeeze().squeeze().data
belief = torch.clamp(belief,0,1).cpu()
belief = torch.cat([
# belief.unsqueeze(0) + in_img[:,:,0],
# belief.unsqueeze(0) + in_img[:,:,1],
# belief.unsqueeze(0) + in_img[:,:,2]
belief.unsqueeze(0),
belief.unsqueeze(0),
belief.unsqueeze(0)
]).unsqueeze(0)
belief = torch.clamp(belief,0,1)
# belief_imgs.append(belief.data.squeeze().cpu().numpy().transpose(1,2,0))
belief_imgs.append(belief.data.squeeze().numpy())
# Create the image grid
belief_imgs = torch.tensor(np.array(belief_imgs))
im_belief = ObjectDetector.get_image_grid(belief_imgs, None,
mean=0, std=1)
return detected_objects, im_belief
@staticmethod
def make_grid(tensor, nrow=8, padding=2,
normalize=False, range_=None, scale_each=False, pad_value=0):
"""Make a grid of images.
Args:
tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W)
or a list of images all of the same size.
nrow (int, optional): Number of images displayed in each row of the grid.
The Final grid size is (B / nrow, nrow). Default is 8.
padding (int, optional): amount of padding. Default is 2.
normalize (bool, optional): If True, shift the image to the range (0, 1),
by subtracting the minimum and dividing by the maximum pixel value.
range (tuple, optional): tuple (min, max) where min and max are numbers,
then these numbers are used to normalize the image. By default, min and max
are computed from the tensor.
scale_each (bool, optional): If True, scale each image in the batch of
images separately rather than the (min, max) over all images.
pad_value (float, optional): Value for the padded pixels.
Example:
See this notebook `here <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>`_
"""
import math
if not (torch.is_tensor(tensor) or
(isinstance(tensor, list) and all(torch.is_tensor(t) for t in tensor))):
raise TypeError('tensor or list of tensors expected, got {}'.format(type(tensor)))
# if list of tensors, convert to a 4D mini-batch Tensor
if isinstance(tensor, list):
tensor = torch.stack(tensor, dim=0)
if tensor.dim() == 2: # single image H x W
tensor = tensor.view(1, tensor.size(0), tensor.size(1))
if tensor.dim() == 3: # single image
if tensor.size(0) == 1: # if single-channel, convert to 3-channel
tensor = torch.cat((tensor, tensor, tensor), 0)
tensor = tensor.view(1, tensor.size(0), tensor.size(1), tensor.size(2))
if tensor.dim() == 4 and tensor.size(1) == 1: # single-channel images
tensor = torch.cat((tensor, tensor, tensor), 1)
if normalize is True:
tensor = tensor.clone() # avoid modifying tensor in-place
if range_ is not None:
assert isinstance(range_, tuple), \
"range has to be a tuple (min, max) if specified. min and max are numbers"
def norm_ip(img, min, max):
img.clamp_(min=min, max=max)
img.add_(-min).div_(max - min + 1e-5)
def norm_range(t, range_):
if range_ is not None:
norm_ip(t, range_[0], range_[1])
else:
norm_ip(t, float(t.min()), float(t.max()))
if scale_each is True:
for t in tensor: # loop over mini-batch dimension
norm_range(t, range)
else:
norm_range(tensor, range)
if tensor.size(0) == 1:
return tensor.squeeze()
# make the mini-batch of images into a grid
nmaps = tensor.size(0)
xmaps = min(nrow, nmaps)
ymaps = int(math.ceil(float(nmaps) / xmaps))
height, width = int(tensor.size(2) + padding), int(tensor.size(3) + padding)
grid = tensor.new(3, height * ymaps + padding, width * xmaps + padding).fill_(pad_value)
k = 0
for y in range(ymaps):
for x in range(xmaps):
if k >= nmaps:
break
grid.narrow(1, y * height + padding, height - padding)\
.narrow(2, x * width + padding, width - padding)\
.copy_(tensor[k])
k = k + 1
return grid
@staticmethod
def get_image_grid(tensor, filename, nrow=3, padding=2,mean=None, std=None):
"""
Saves a given Tensor into an image file.
If given a mini-batch tensor, will save the tensor as a grid of images.
"""
from PIL import Image
# tensor = tensor.cpu()
grid = ObjectDetector.make_grid(tensor, nrow=nrow, padding=10,pad_value=1)
if not mean is None:
# ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
else:
ndarr = grid.mul(0.5).add(0.5).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
im = Image.fromarray(ndarr)
# im.save(filename)
return im
try to run this please and report what you get.
Thank you for the code. Here are the results:
Verifying that it works correctly with the soup can:
Trying on image from test set:
Trying on webcam:
From this I would say that something has gone wrong with the training.
My header.txt looks like this:
Namespace(batchsize=32, data='/home/sruiz/datasets/kalo1.5_1_object_100000_400x400', datasize=None, datatest='/home/sruiz/datasets/kalo1.5_1_object_2000_400x400', epochs=60, gpuids=[0, 1, 2, 3], imagesize=400, loginterval=100, lr=0.0001, manualseed=57, namefile='epoch', nbupdates=None, net='', noise=2.0, object='KALO_K1_5_edited', option='default', outf='train_tmp', pretrained=True, save=False, sigma=4, workers=8)seed: 57
Could you visualize the gt belief maps you generate when training. One of these, https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/train.py#L1338 . I think there is something going on with your training data.
"exported_objects": [
{
"class": "KALO_K1_5_edited",
"segmentation_class_id": 144,
"segmentation_instance_id": 16014600,
"fixed_model_transform": [
[ 0, 0, 1, 0 ],
[ -1, 0, 0, 0 ],
[ 0, -1, 0, 0 ],
[ -0.55800002813339233, -0.25029999017715454, 0.22859999537467957, 1 ]
],
"cuboid_dimensions": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ]
}
Where are the projected cuboid points?
I am pretty sure your export is missing projected_cuboid, I dont remember if you have to add it to NDDS to export it. I have not use it in a while, @thangt can you refresh my memory?
@TontonTremblay That is the object_settings.json file which describe the object to capture. The projected_cuboid is in each of the captured frame data: 000001.json ...
Could it be that there is something wrong with my training data? Here is the data for 000001. In unreal engine the segmentation id assignment type
and segmentation id assignment type
are both set to spread evenly
.
000001.cs.png:
000001.depth.png:
000001.is.png:
000001.png:
And the 000001.json file. These 4 images and 1 json file make up the training data for this item.
I think I am making a mistake in config_pose.yaml
. Namely, in class_ids I do the following:
class_ids: {
"cracker": 1,
"gelatin": 2,
....... [omitted] ......
"GreenBeans" : 35,
"PeasAndCarrots" : 36,
"KALO_K1_5_edited" : 37
}
I think that the id of "KALO_K1_5_edited" shoud be that picked by NDDS in *****.cs.png
.
Class id should not impact the algorithm. Can you try to render the belief map from a GT you generated please? See above which line in the training.
Sorry for the delay. I am retraining the model and storing the target_belief
(https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/train.py#L1338) for every epoch. I will report back.
Hi, I am also trying to use NDDS to generate a new data set. But how can I import my FBX format 3D model and replace the original object in NDDS. Thank you.
Steps that I took:
In the loss_train.csv file I see that the loss in epoch 1 is 0.28 and the loss in epoch 60 is 2.9E-09
The training data generated with NDDS looks like this:
I adapted the script from this issue to apply inference to the training images. This is my script for inference. When using the script on the soup cans and the trained model from the readme, it works perfectly. When applied to my custom object, DOPE does not detect the object class, nor does it detect the pose.