Unable to use pretrained model file

davidsandberg / facenet

Face recognition using Tensorflow

MIT License

13.81k stars 4.81k forks source link

Unable to use pretrained model file #817

Open sanjayanayak opened 6 years ago

sanjayanayak commented 6 years ago

I want to train for 51 classes having 5 samples each and want to use pretrained model given in this project. I ran the command as "python3 src/train_tripletloss.py --data_dir training-faces/FEI-FACE-PART-1-ALIGN/ --image_size 160 --model_def models.inception_resnet_v1 --optimizer RMSPROP --learning_rate 0.01 --weight_decay 1e-4 --max_nrof_epochs 100 --images_per_person=5 --epoch_size=100 --pretrained_model=pretrained-model/20180408-102900/model-20180408-102900.ckpt-90.index". I found below error. No idea how to resolve the issue. Please help me.

NotFoundError (see above for traceback): Tensor name "InceptionResnetV1/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta" not found in checkpoint files pretrained-model/20180408-102900/model-20180408-102900.ckpt-90.index [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

rain2008204 commented 6 years ago

@sanjayanayak you can try this code: all_vars = tf.trainable_variables() var_to_restore = [v for v in all_vars if not v.name.startswith('Logits')] saver = tf.train.Saver(var_to_restore, max_to_keep=5,keep_checkpoint_every_n_hours=1.0)

maoxiantuanzi commented 6 years ago

I meet the same problem, have you solved it?

Victoria2333 commented 6 years ago

@sanjayanayak change the checkpoint file to pretrained-model/20180408-102900/model-20180408-102900.ckpt-90, but i still got the error: Assign requires shapes of both tensors to match. lhs shape= [1792,128] rhs shape= [1792,512] [[Node: save/Assign_21 = Assign[T=DT_FLOAT, _class=["loc:@InceptionResnetV1/Bottleneck/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](InceptionResnetV1/Bottleneck/weights, save/RestoreV2/_841)]] [[Node: save/RestoreV2/_1554 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_730_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Victoria2333 commented 6 years ago

@rain2008204 I tried this and it worked. But when i want to finetune other layer, for example:all_vars = tf.trainable_variables() var_to_restore = [v for v in all_vars if not v.name.startswith('Mixed_8b')] (ps:Mixed_8b is the lat but one layer in inception-resnet-v1) saver = tf.train.Saver(var_to_restore, max_to_keep=5,keep_checkpoint_every_n_hours=1.0)

and train on my_classifier.py to get .pkl file ,but I got the error: invalid argument 2: size '[16x8x8x8]' is invalid for input with 2048 elements at /pytorch/aten/src/TH/THStorage.c:41 Do you know the reason??? And i found that 'Mixed_8b' corresponds 128 tensor, but 'Logits' corresponds 512 tensor. I don't know why...

hungnv21292 commented 6 years ago

hi @Victoria2333 I also fintune use tripletloss method with pretrained model is model-20180402-114759.ckpt-275 and get an error same with you: "InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [512] [[Node: save/Assign_20 = Assign[T=DT_FLOAT, _class=["loc:@InceptionResnetV1/Bottleneck/BatchNorm/moving_variance"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](InceptionResnetV1/Bottleneck/BatchNorm/moving_variance, save/RestoreV2/_1239)]]"

How do you solve this problem? Could you please share with me? Thank you so much.