google-research / deeplab2

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.
Apache License 2.0
1k stars 159 forks source link

ValueError: The initial value's shape is not compatible with the explicitly supplied `shape` argument #126

Open rcruzgar opened 2 years ago

rcruzgar commented 2 years ago

Hi,

I am trying to do semantic segmentation using the Panoptic-Deeplab example (https://github.com/google-research/deeplab2/blob/main/g3doc/projects/panoptic_deeplab.md) and setting this to false in the config file:

    instance {
      enable: false
    }

See the proto file (as txt to upload it here): resnet50_os16_semantic.txt, which is basically this.

I also downloaded the checkpoint resnet50_os16_panoptic_deeplab_coco_train.tar.gz, which I added to the proto file after the untar.

I would like also to attach my training annotations, as .txt to be able to upload it here, but it's actually .json.

I am running everything on a Jupyter Notebook environment from AWS Sagemaker, with a GPU.

I obtain the following error:

I0826 12:24:02.561120 140623071049536 api.py:446] Eval scale 1.0; setting pooling size to [7, 7]
Traceback (most recent call last):
  File "deeplab2/trainer/train.py", line 76, in <module>
    app.run(main)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "deeplab2/trainer/train.py", line 72, in main
    FLAGS.num_gpus)
  File "/home/ec2-user/SageMaker/dish_segmentation/deeplab2/trainer/train_lib.py", line 201, in run_experiment
    build_deeplab_model(deeplab_model, crop_size)
  File "/home/ec2-user/SageMaker/dish_segmentation/deeplab2/trainer/train_lib.py", line 80, in build_deeplab_model
    tf.keras.Input(input_shape, batch_size=batch_size), training=False)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 977, in __call__
    input_list)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1115, in _functional_construction_call
    inputs, input_masks, args, kwargs)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 848, in _keras_tensor_symbolic_call
    return self._infer_output_signature(inputs, args, kwargs, input_masks)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py", line 888, in _infer_output_signature
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 695, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/deeplab.py:155 call  *
        pred_dict = self._decoder(
    /home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/encoder/axial_resnet.py:764 call  *
        current_output, activated_output, memory_feature, endpoints = (
    /home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/encoder/axial_resnet.py:551 call_encoder_before_stacked_decoder  *
        current_output = self._stem(inputs)
    /home/ec2-user/SageMaker/dish_segmentation/deeplab2/model/layers/convolutions.py:287 call  *
        x = self._conv(x)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:1030 __call__  **
        self._maybe_build(inputs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:2659 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/layers/convolutional.py:204 build
        dtype=self.dtype)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer.py:663 add_weight
        caching_device=caching_device)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:818 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/keras/engine/base_layer_utils.py:129 make_variable
        shape=variable_shape if variable_shape else None)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:266 __call__
        return cls._variable_v1_call(*args, **kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:227 _variable_v1_call
        shape=shape)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:2127 creator_with_resource_vars
        created = self._create_variable(next_creator, **kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/distribute/one_device_strategy.py:278 _create_variable
        return next_creator(**kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:205 <lambda>
        previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:2626 default_variable_creator
        shape=shape)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/variables.py:270 __call__
        return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1613 __init__
        distribute_strategy=distribute_strategy)
    /home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1753 _init_from_args
        (initial_value.shape, shape))

    ValueError: The initial value's shape ((7, 7, 3, 64)) is not compatible with the explicitly supplied `shape` argument ((7, 7, 100, 64)).

Note I set the crop size to 100. So after changing to 3 the crop size, I get:

ValueError: The initial value's shape ((64,)) is not compatible with the explicitly supplied `shape` argument ((2,)).

I guess because it's iterating over a new image.

I run the training this way:

python deeplab2/trainer/train.py \
    --config_file=/home/ec2-user/SageMaker/dish_segmentation/config_files/resnet50_os16_semantic.textproto \
    --mode=train \
    --model_dir=/home/rcruz/PycharmProjects/dish_segmentation/model \
    --num_gpus=1

Could you please help me with any clue.

I am beginner on segmentation models, so I might be making incorrect assumptions.

Thanks a lot!

aquariusjay commented 2 years ago

Hi @rcruzgar,

Thanks for the issue. However, it goes beyond our scope to help you debug. We would suggest you run our provided tutorials (e.g., Cityscapes).

Cheers,

VeSt-hub commented 1 year ago

Hi!) Got the same issue with semantic segmentation. @rcruzgar, may be you've already solved it?

Thanks a lot!

aquariusjay commented 1 year ago

Hello,

Thanks for reporting the issue. Unfortunately, if you want to train a semantic-only model, you could not use the trained panoptic checkpoints for initialization (as shown in the error log that the job fails to load the trained checkpoint). You need to train a new one by yourself.

Cheers,

VeSt-hub commented 1 year ago

It means that there are no pretrain models for semantic only in deeplab2 repo, right?

shogoinadomi commented 5 months ago

Hi, I attached restore_semantic_last_layer_from_initial_checkpoint : false with a textproto file like model_options { initial_checkpoint: path-to-pretrained-model (for me it was max_deeplab_l_backbone_os16_axial_deeplab_cityscapes_trainfine/ckpt-60000) restore_semantic_last_layer_from_initial_checkpoint: false

... } then it worked for my own semantic only dataset.