google-research / simclr

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
https://arxiv.org/abs/2006.10029
Apache License 2.0
4.09k stars 624 forks source link

Cannot finetune the full network from the TF2 SavedModel #116

Open wangkua1 opened 3 years ago

wangkua1 commented 3 years ago

Dear Ting,

Thanks for adding the TF2 SavedModels. Looks like the `trainable_variables_list' for the SavedModels are empty, and I cannot finetune the full network from these SavedModels. Is there a workaround, or will the trainable ckpts be released soon?

Thanks, Jackson

chentingpc commented 3 years ago

@saxenasaurabh, who created TF2 and converted SavedModels from TF1, probably knows more about this.

free-bit commented 3 years ago

[EDIT]: I found a closely related problem and a solution among the closed issue: #108. However, my problem is not solved yet. Therefore, I am moving the updated version of my question as a comment under that issue.

Hello,

I have a different problem related to fine-tuning models saved in TF2 SavedModel format. I want to use the self-supervised model as the backbone for another task. For that, I tried to use r50_1x_sk0 (which I downloaded from gs://simclr-checkpoints-tf2/simclrv2/pretrained). When the model is loaded with tf.saved_model.load (as illustrated in the notebooks), I get lines of warnings as follows:

WARNING:absl:Importing a function (__inference_sync_batch_normalization_42_layer_call_and_return_conditional_losses_34851) with ops with custom gradients. Will likely fail if a gradient is requested.
...

Although the model is loaded despite these warnings, when I called the model with flag trainable=True, I received the following error, which was indicated by the warning messages above.

LookupError: No gradient defined for operation 'resnet/block_group4/bottleneck_block_15/batch_norm_relu_52/sync_batch_normalization_52/moments/IdentityN_1' (op type: IdentityN)

Do you know a solution to this problem?

I also tried to re-instantiate the model and then load the weights only (using the checkpoint under variables folder within the saved_model folder and using tf.train.Checkpoint). then I get an error related to the mismatch between checkpointed variables and the variables of the instantiated model as follows:

AssertionError: Nothing except the root object matched a checkpointed value. Typically this means that the checkpoint does not match the Python program. The following objects have no matching checkpointed value: ...

How can I use checkpoints or the SavedModel to insert your models as a backbone to a network for a different task?

Thank you for your time.

rishabhm12 commented 1 year ago

Guys, can someone please answer this?. I am facing the same issue. Please share some code snippet to finetune the entire n/w