Open fjsaezm opened 2 years ago
Do you know which line of code leads to this error? and have you set the image size, model size, etc. the same for train run and for eval run?
Thank you for your answer!
I had the same image size, resnet architecture, both for the train then eval and the finetune run. I did not change anything in the code itself, just tried to do it by the flags passed to the model.
For reference, I trained the model using the following command:
python run.py --mode=train_then_eval --train_epochs=100 --learning_rate=1.0 --dataset=cifar10 --image_size=32 --eval_split=test --use_blur=True --use_tpu=False --train_batch_size=256 --temperature=0.25 --weight_decay=0.0001 --color_jitter_strength=0.65 --resnet_depth=50 --model_dir=models/256-0.25-0.0001-0.65-50-blur
For trying to finetune in a different dataset, I used the command:
python run.py --mode=train_then_eval --train_mode=finetune --fine_tune_after_block=4 --zero_init_logits_layer=True --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0 \--train_epochs=100 --train_batch_size=256 --warmup_epochs=0 --dataset=imagenette/160px-v2 --image_size=32 --eval_split=test --resnet_depth=50 --checkpoint=models/256-0.25-0.0001-0.65-50-blur/saved_model/19695 --model_dir=models/imagenette/ --use_tpu=False --eval_split=validation
The error I get is:
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run.py", line 667, in main
perform_evaluation(model, builder, eval_steps,
File "run.py", line 401, in perform_evaluation
run_single_step(iterator)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 956, in _call
return self._concrete_stateful_fn._call_flat(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
outputs = execute.execute(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 90720 values, but the requested shape has 3072
[[node Reshape (defined at run.py:395) ]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
[[OptionalHasValue_1/_4]]
(1) Invalid argument: Input to reshape is a tensor with 90720 values, but the requested shape has 3072
[[node Reshape (defined at run.py:395) ]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
0 successful operations.
0 derived errors ignored. [Op:__inference_run_single_step_146536]
Function call stack:
run_single_step -> run_single_step
Maybe I should just change mode to pretrain (removing the train_then_eval) and then use the weights of the pretrain? I just thought the weights saved from the train_then_eval were the same as the ones saved from pretrain
Thanks!
I think you hit an edge case/bug :) since you use image_size=32, center_crop at eval time is disabled, so the code directly reshape a raw image of size >32x32x3 into 32x32x3 which leads to the error. What you could do to resolve it is simply resize the image to target size before reshape. This was not a problem on CIFAR10 as train/test images are all of the same 32x32 image size.
Thank you for your answer again!
As you have just suggested, I have forced the preprocessing to resize the image to 32x32, and I have printed the shape of the tensor and everything seems to be fine. I have added
image = tf.image.resize(image,[32,32],preserve_aspect_ratio=True)
image = tf.reshape(image, [height, width, 3])
image = tf.clip_by_value(image, 0., 1.)
return image
However, despite the numbers changing on the error and being closer one to each other, the error persists.
Traceback (most recent call last):
File "run.py", line 676, in <module>
app.run(main)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run.py", line 667, in main
perform_evaluation(model, builder, eval_steps,
File "run.py", line 401, in perform_evaluation
run_single_step(iterator)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__
return graph_function._call_flat(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
outputs = execute.execute(
File "/home/fjaviersaezm/miniconda3/envs/simclr/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 2592 values, but the requested shape has 3072
[[node Reshape (defined at run.py:395) ]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
[[OptionalHasValue_1/_4]]
(1) Invalid argument: Input to reshape is a tensor with 2592 values, but the requested shape has 3072
[[node Reshape (defined at run.py:395) ]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
0 successful operations.
0 derived errors ignored. [Op:__inference_run_single_step_5886]
Function call stack:
run_single_step -> run_single_step
2592 seems to be 27x32x3 , so 5 rows are removed. Is there any other place where this may happen??
Thank you in advance!
Hi there! I have successfully trained a few models using the tensorflow version of simclr! Now, I would like to transfer the encoder learning to a different dataset (Concretely, from CIFAR10 to Imagenette) to see how the model performs.
Using
train_then_eval
andmode = finetune
, I get to train the linear head (or , at least, some training logs are shown), but when it comes to evaluating, I get the following errors:What am I doing wrong?
Thank u guys!