Get the feature map of PointGNN

yuxguo commented 4 years ago

Hi, I am trying to get the feature map of PointGNN, and I modified the method MultiLayerFastLocalGraphModelV2.predict() in models/models.py. In the return value, I only added return logits, box_encodings, tfeatures, but it cause nan in tfeature and reg_loss. Can you help me?

yuxguo commented 4 years ago

I also try to split the method MultiLayerFastLocalGraphModelV2.predict() into 2 methods extract_features() and predict() as follow:

    def extract_features(self,
        t_initial_vertex_features,
        t_vertex_coord_list,
        t_keypoint_indices_list,
        t_edges_list,
        is_training,
        ):
        """
        Predict the objects with initial vertex features and a list of graphs.
        The model applies layers sequentially while each layer choose the graph
        that they operates. For example, a layer can choose the i-th graph,
        which is composed of t_vertex_coord_list[i], t_edges_list[i], and
        optionally t_keypoint_indices_list[i]. It operates on the graph and
        output the updated vertex_features. Then the next layer takes the
        vertex_features and also choose a graph to further update the features.

        Args:
            t_initial_vertex_features: a [N, M] tensor, the initial features of
            N vertices. For example, the intensity value of lidar reflection.
            t_vertex_coord_list: a list of [Ni, 3] tensors, the coordinates of
            a list of graph vertices.
            t_keypoint_indices_list: a list of [Nj, 1] tensors or None. For a
            pooling layer, it outputs a reduced number of vertices, aka. the
            keypoints. t_keypoint_indices_list[i] is the indices of those
            keypoints. For a gnn layer, it does not reduce the vertex number,
            thus t_keypoint_indices_list[i] should be set to 'None'.NoneNONENONENONE
            t_edges_list: a list of [Ki, 2] tensors. t_edges_list[i] are edges
            for the i-th graph. it contains Ki pair of (source, destination)
            vertex indices.
            is_training: boolean, whether in training mode or not.
        returns: [N_output, num_classes] logits tensor for classification,
        [N_output, num_classes, box_encoding_len] box_encodings tensor for
        localization.
        """
        with slim.arg_scope([slim.batch_norm], is_training=is_training), \
             slim.arg_scope([slim.fully_connected], weights_regularizer=self._regularizer):
                tfeatures_list = []
                tfeatures = t_initial_vertex_features
                tfeatures_list.append(tfeatures)
                for idx in range(len(self._layer_configs)-1):
                    layer_config = self._layer_configs[idx]
                    layer_scope = layer_config['scope']
                    layer_type = layer_config['type']
                    layer_kwargs = layer_config['kwargs']
                    graph_level = layer_config['graph_level']
                    t_vertex_coordinates = t_vertex_coord_list[graph_level]
                    t_keypoint_indices = t_keypoint_indices_list[graph_level]
                    t_edges = t_edges_list[graph_level]
                    with tf.variable_scope(layer_scope, reuse=tf.AUTO_REUSE):
                        flgn = self._default_layers_type[layer_type]
                        print('@ level %d Graph, Add layer: %s, type: %s'%
                            (graph_level, layer_scope, layer_type))
                        if 'device' in layer_config:
                            with tf.device(layer_config['device']):
                                tfeatures = flgn.apply_regular(
                                    tfeatures,
                                    t_vertex_coordinates,
                                    t_keypoint_indices,
                                    t_edges,
                                    **layer_kwargs)
                        else:
                            tfeatures = flgn.apply_regular(
                                tfeatures,
                                t_vertex_coordinates,
                                t_keypoint_indices,
                                t_edges,
                                **layer_kwargs)

                        tfeatures_list.append(tfeatures)
                        print('Feature Dim:' + str(tfeatures.shape[-1]))
                        print('tfeature_shape:', tfeatures.shape)
        return tfeatures_list[-1]

    def predict(self, tfeatures, is_training):
        with slim.arg_scope([slim.batch_norm], is_training=is_training), slim.arg_scope([slim.fully_connected], weights_regularizer=self._regularizer):
            predictor_config = self._layer_configs[-1]
            assert (predictor_config['type'] == 'classaware_predictor' or
                predictor_config['type'] == 'classaware_predictor_128' or
                predictor_config['type'] == 'classaware_separated_predictor')
            predictor = self._default_layers_type[predictor_config['type']]
            with tf.variable_scope(predictor_config['scope'], reuse=tf.AUTO_REUSE):
                logits, box_encodings = predictor.apply_regular(tfeatures, num_classes=self.num_classes, box_encoding_len=self.box_encoding_len, **predictor_config['kwargs'])
                print("Prediction %d classes" % self.num_classes)
        return logits, box_encodings

and in the graph, I changed from the original code to

t_features = model.extract_features(
                t_initial_vertex_features, t_vertex_coord_list,
                t_keypoint_indices_list, t_edges_list, t_is_training)
t_logits, t_pred_box = model.predict(t_features, t_is_training)

but it also cause nan values in t_features and reg_loss.

WeijingShi commented 4 years ago

Hi SkilfulBugsMaker, I tried your code with inference and it did not give Nan. I think you get the Nan during the training, is that the case? The current training uses a small batch size and basic SGD. When you decrease the batch size further or increase the learning rate, it is easy to get Nan. I find gradient clipping and cyclical learning rate helps a lot. For gradient clipping, you can try add the following code after Line 404 in train.py

if train_config.get('clip_gradient', False):
    grads, vars = zip(*grads_cross_gpu)
    clipped_grads, raw_global_norm = tf.clip_by_global_norm(grads, train_config['clip_norm'])
    grads_cross_gpu = list(zip(clipped_grads, vars))

Setting train_config['clip_norm'] to 1.0 might a fine starting point. Thanks,

yuxguo commented 4 years ago

Thank you, it solves my problem! By the way, There are some inconsistencies in train.py while using rgb as initial vertex features

elif config['input_features'] == '0rgb':
            input_v = np.hstack([np.zeros((cam_rgb_points.attr.shape[0], 1)),
                cam_rgb_points.attr[:, 1:]])

elif config['input_features'] == 'rgb':
    t_initial_vertex_features = tf.placeholder(
        dtype=tf.float32, shape=[None, 3])

The dimension of pointcloud data and placeholder are not match.

WeijingShi / Point-GNN

Get the feature map of PointGNN #15