Closed yuxguo closed 4 years ago
I also try to split the method MultiLayerFastLocalGraphModelV2.predict()
into 2 methods extract_features()
and predict()
as follow:
def extract_features(self,
t_initial_vertex_features,
t_vertex_coord_list,
t_keypoint_indices_list,
t_edges_list,
is_training,
):
"""
Predict the objects with initial vertex features and a list of graphs.
The model applies layers sequentially while each layer choose the graph
that they operates. For example, a layer can choose the i-th graph,
which is composed of t_vertex_coord_list[i], t_edges_list[i], and
optionally t_keypoint_indices_list[i]. It operates on the graph and
output the updated vertex_features. Then the next layer takes the
vertex_features and also choose a graph to further update the features.
Args:
t_initial_vertex_features: a [N, M] tensor, the initial features of
N vertices. For example, the intensity value of lidar reflection.
t_vertex_coord_list: a list of [Ni, 3] tensors, the coordinates of
a list of graph vertices.
t_keypoint_indices_list: a list of [Nj, 1] tensors or None. For a
pooling layer, it outputs a reduced number of vertices, aka. the
keypoints. t_keypoint_indices_list[i] is the indices of those
keypoints. For a gnn layer, it does not reduce the vertex number,
thus t_keypoint_indices_list[i] should be set to 'None'.NoneNONENONENONE
t_edges_list: a list of [Ki, 2] tensors. t_edges_list[i] are edges
for the i-th graph. it contains Ki pair of (source, destination)
vertex indices.
is_training: boolean, whether in training mode or not.
returns: [N_output, num_classes] logits tensor for classification,
[N_output, num_classes, box_encoding_len] box_encodings tensor for
localization.
"""
with slim.arg_scope([slim.batch_norm], is_training=is_training), \
slim.arg_scope([slim.fully_connected], weights_regularizer=self._regularizer):
tfeatures_list = []
tfeatures = t_initial_vertex_features
tfeatures_list.append(tfeatures)
for idx in range(len(self._layer_configs)-1):
layer_config = self._layer_configs[idx]
layer_scope = layer_config['scope']
layer_type = layer_config['type']
layer_kwargs = layer_config['kwargs']
graph_level = layer_config['graph_level']
t_vertex_coordinates = t_vertex_coord_list[graph_level]
t_keypoint_indices = t_keypoint_indices_list[graph_level]
t_edges = t_edges_list[graph_level]
with tf.variable_scope(layer_scope, reuse=tf.AUTO_REUSE):
flgn = self._default_layers_type[layer_type]
print('@ level %d Graph, Add layer: %s, type: %s'%
(graph_level, layer_scope, layer_type))
if 'device' in layer_config:
with tf.device(layer_config['device']):
tfeatures = flgn.apply_regular(
tfeatures,
t_vertex_coordinates,
t_keypoint_indices,
t_edges,
**layer_kwargs)
else:
tfeatures = flgn.apply_regular(
tfeatures,
t_vertex_coordinates,
t_keypoint_indices,
t_edges,
**layer_kwargs)
tfeatures_list.append(tfeatures)
print('Feature Dim:' + str(tfeatures.shape[-1]))
print('tfeature_shape:', tfeatures.shape)
return tfeatures_list[-1]
def predict(self, tfeatures, is_training):
with slim.arg_scope([slim.batch_norm], is_training=is_training), slim.arg_scope([slim.fully_connected], weights_regularizer=self._regularizer):
predictor_config = self._layer_configs[-1]
assert (predictor_config['type'] == 'classaware_predictor' or
predictor_config['type'] == 'classaware_predictor_128' or
predictor_config['type'] == 'classaware_separated_predictor')
predictor = self._default_layers_type[predictor_config['type']]
with tf.variable_scope(predictor_config['scope'], reuse=tf.AUTO_REUSE):
logits, box_encodings = predictor.apply_regular(tfeatures, num_classes=self.num_classes, box_encoding_len=self.box_encoding_len, **predictor_config['kwargs'])
print("Prediction %d classes" % self.num_classes)
return logits, box_encodings
and in the graph, I changed from the original code to
t_features = model.extract_features(
t_initial_vertex_features, t_vertex_coord_list,
t_keypoint_indices_list, t_edges_list, t_is_training)
t_logits, t_pred_box = model.predict(t_features, t_is_training)
but it also cause nan values in t_features
and reg_loss
.
Hi SkilfulBugsMaker, I tried your code with inference and it did not give Nan. I think you get the Nan during the training, is that the case? The current training uses a small batch size and basic SGD. When you decrease the batch size further or increase the learning rate, it is easy to get Nan. I find gradient clipping and cyclical learning rate helps a lot. For gradient clipping, you can try add the following code after Line 404 in train.py
if train_config.get('clip_gradient', False):
grads, vars = zip(*grads_cross_gpu)
clipped_grads, raw_global_norm = tf.clip_by_global_norm(grads, train_config['clip_norm'])
grads_cross_gpu = list(zip(clipped_grads, vars))
Setting train_config['clip_norm'] to 1.0 might a fine starting point. Thanks,
Thank you, it solves my problem!
By the way, There are some inconsistencies in train.py
while using rgb as initial vertex features
elif config['input_features'] == '0rgb':
input_v = np.hstack([np.zeros((cam_rgb_points.attr.shape[0], 1)),
cam_rgb_points.attr[:, 1:]])
elif config['input_features'] == 'rgb':
t_initial_vertex_features = tf.placeholder(
dtype=tf.float32, shape=[None, 3])
The dimension of pointcloud data and placeholder are not match.
Hi, I am trying to get the feature map of PointGNN, and I modified the method
MultiLayerFastLocalGraphModelV2.predict()
inmodels/models.py
. In the return value, I only addedreturn logits, box_encodings, tfeatures
, but it cause nan intfeature
andreg_loss
. Can you help me?