BichenWuUCB / squeezeDet

A tensorflow implementation for SqueezeDet, a convolutional neural network for object detection.
BSD 2-Clause "Simplified" License
739 stars 306 forks source link

Final model checkpoint is not saved even after running for many hours. #93

Open muthiyanbhushan opened 6 years ago

muthiyanbhushan commented 6 years ago

Hello,

I am trying to run the model for 100000 steps.

The model checkpoint saved has 3 files.

1) model.ckpt-99999.data-00000-of-00001 2) model.ckpt-99999.index 3) model.ckpt-99999.meta

But the final checkpoint value is not being stored like you have "model.ckpt-87000".

Can you please let me know how can I get this checkpoint file ?

Thanks.

muthiyanbhushan commented 6 years ago

Hello Bichen,

Can you please let me know what might be the issue for not getting the final weights?

Thanks.

eleboss commented 6 years ago

@muthiyanbhushan I am also confuse about this question, have you solved that ? I simply delete .index in model.ckpt-99999.index but it seems not working properly.

eleboss commented 6 years ago

and that is my error

tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "fire7/squeeze1x1/biases" not found in checkpoint files ./data/model_checkpoints/squeezeDet/model.ckpt-999 [[Node: save/RestoreV2_50 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_50/tensor_names, save/RestoreV2_50/shape_and_slices)]] [[Node: save/RestoreV2_51/_63 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:1", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_257_save/RestoreV2_51", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:1"]()]]

really dont understand why it said missiing a layer

eleboss commented 6 years ago

(update) Through learning tensorflow, I found that Tensorflow saved the checkpoint files in seperate files, and we dont need to care about how many files, just read it. To me, I just copy all files

model.ckpt-99999.data-00000-of-00001
model.ckpt-99999.index
model.ckpt-99999.meta

and use model.ckpt-99999 to read. that solved

muthiyanbhushan commented 6 years ago

@eleboss,

I am also following similar procedure for now. Thanks for your response.