calmiLovesAI / Basic_CNNs_TensorFlow2

A tensorflow2 implementation of some basic CNNs(MobileNetV1/V2/V3, EfficientNet, ResNeXt, InceptionV4, InceptionResNetV1/V2, SENet, SqueezeNet, DenseNet, ShuffleNetV2, ResNet).
MIT License
524 stars 178 forks source link

Error in training with densenet #16

Open ghost opened 4 years ago

ghost commented 4 years ago

run train.py report errors Traceback (most recent call last): File "E:/work/Basic_CNNs_TensorFlow2-master/train_test.py", line 100, in model.save_weights(filepath=save_model_dir+"epoch-{}".format(epoch), save_format='tf') File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1123, in save_weights self._trackable_saver.save(filepath, session=session) File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\util.py", line 1168, in save file_prefix=file_prefix_tensor, object_graph_tensor=object_graph_tensor) File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\util.py", line 1108, in _save_cached_when_graph_building object_graph_tensor=object_graph_tensor) File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\util.py", line 1076, in _gather_saveables feed_additions) = self._graph_view.serialize_object_graph() File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\graph_view.py", line 379, in serialize_object_graph trackable_objects, path_to_root = self._breadth_first_traversal() File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\graph_view.py", line 199, in _breadth_first_traversal for name, dependency in self.list_dependencies(current_trackable): File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\graph_view.py", line 159, in list_dependencies return obj._checkpoint_dependencies File "E:\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\training\tracking\data_structures.py", line 509, in _checkpoint_dependencies "automatically un-wrapped and subsequently ignored." % (self,))) ValueError: Unable to save the object ListWrapper([]) (a list wrapper constructed to track trackable TensorFlow objects). A list element was replaced (setitem, setslice), deleted (delitem, delslice), or moved (sort). In order to support restoration on object creation, tracking is exclusively for append-only data structures.

If you don't need this list checkpointed, wrap it in a tf.contrib.checkpoint.NoDependency object; it will be automatically un-wrapped and subsequently ignored.

How to solve

Davidxswang commented 3 years ago

I am having the same issue here. I guess it's because when we build densenet model, in the DenseBlock init method, we use a list to store the layers, but this is not allowed if we want to save the model using checkpoint. If we save the checkpoint in .h5 format, it's okay, but if we use default format to save, this is not okay.

I tried to write DenseBlock in function rather than a subclass, the issue still exists. I am still thinking about how to walkaround.

The issue is mentioned in TensorFlow issues as well, https://github.com/tensorflow/tensorflow/issues/36916

class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.features_list = []
        self.bottle_necks = []
        for i in range(self.num_layers):
            self.bottle_necks.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, inputs, training=None, **kwargs):
        self.features_list.append(inputs)
        x = inputs
        for i in range(self.num_layers):
            y = self.bottle_necks[i](x, training=training)
            self.features_list.append(y)
            x = tf.concat(self.features_list, axis=-1)
        self.features_list.clear()
        return x

Have you had any idea how to solve this problem yet?

ghost commented 3 years ago

我在这里也有同样的问题。我想是因为当我们建立密度模型时,在DenseBlock中依尼特方法时,我们使用一个列表来存储层,但如果要使用检查点保存模型,则不允许这样做。如果我们以.h5格式保存检查点,这是可以的,但是如果我们使用默认格式保存,这是不行的。

我试图用函数而不是子类来编写DenseBlock,这个问题仍然存在。我还在想怎么走来走去。

在TensorFlow问题中也提到了这个问题,TensorFlow/TensorFlow#36916

class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.features_list = []
        self.bottle_necks = []
        for i in range(self.num_layers):
            self.bottle_necks.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, inputs, training=None, **kwargs):
        self.features_list.append(inputs)
        x = inputs
        for i in range(self.num_layers):
            y = self.bottle_necks[i](x, training=training)
            self.features_list.append(y)
            x = tf.concat(self.features_list, axis=-1)
        self.features_list.clear()
        return x

你知道如何解决这个问题吗?

Suggest this code https://github.com/Keyird/DeepLearning-TensorFlow2.0/blob/master/DenseNet/model.py

Davidxswang commented 3 years ago

我在这里也有同样的问题。我想是因为当我们建立密度模型时,在DenseBlock中依尼特方法时,我们使用一个列表来存储层,但如果要使用检查点保存模型,则不允许这样做。如果我们以.h5格式保存检查点,这是可以的,但是如果我们使用默认格式保存,这是不行的。 我试图用函数而不是子类来编写DenseBlock,这个问题仍然存在。我还在想怎么走来走去。 在TensorFlow问题中也提到了这个问题,TensorFlow/TensorFlow#36916

class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.features_list = []
        self.bottle_necks = []
        for i in range(self.num_layers):
            self.bottle_necks.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, inputs, training=None, **kwargs):
        self.features_list.append(inputs)
        x = inputs
        for i in range(self.num_layers):
            y = self.bottle_necks[i](x, training=training)
            self.features_list.append(y)
            x = tf.concat(self.features_list, axis=-1)
        self.features_list.clear()
        return x

你知道如何解决这个问题吗?

Suggest this code https://github.com/Keyird/DeepLearning-TensorFlow2.0/blob/master/DenseNet/model.py

Thank you very much! I just figured this out! I changed the code in DenseBlock and this problem seems solved.

Original implementation:

class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.features_list = []
        self.bottle_necks = []
        for i in range(self.num_layers):
            self.bottle_necks.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, inputs, training=None, **kwargs):
        self.features_list.append(inputs)
        x = inputs
        for i in range(self.num_layers):
            y = self.bottle_necks[i](x, training=training)
            self.features_list.append(y)
            x = tf.concat(self.features_list, axis=-1)
        self.features_list.clear()
        return x

After I changed:

class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.bottle_necks = []
        # so here no self.features_list is used
        for i in range(self.num_layers):
            self.bottle_necks.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, inputs, training=None, **kwargs):
        # since no length-changeable list is in this function, it seems problem has been solved. 
        x = inputs
        for i in range(self.num_layers):
            y = self.bottle_necks[i](x, training=training)
            x = tf.concat([x, y], axis=-1)
        return x