Closed alirezadizaji closed 3 years ago
To continue training, you don't need to specify the checkpoint
flag. Just set the model_dir
to previous training folder, it will automatically restore from the latest checkpoint in the model and continue training, until the global_step
hits the targeted total training steps.
To continue training, you don't need to specify the
checkpoint
flag. Just set themodel_dir
to previous training folder, it will automatically restore from the latest checkpoint in the model and continue training, until theglobal_step
hits the targeted total training steps.
yes that worked, thanks so much.
Hi, I was pretraining simclrv2 and before finishing, the process was killed by linux kernel. so I wanted to resume pretraining by using checkpoint file via determining its directory for
--checkpoint
. however, I got error below.""" Traceback (most recent call last): File "run.py", line 440, in
app.run(main)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run.py", line 428, in main
data_lib.build_input_fn(builder, True), max_steps=train_steps)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3089, in train
rendezvous.raise_errors()
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3084, in train
saving_listeners=saving_listeners)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 349, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1175, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1204, in _train_model_default
self.config)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2921, in _call_model_fn
config)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1163, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3179, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1700, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2043, in _call_model_fn
return estimator_spec.as_estimator_spec()
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 393, in as_estimator_spec
scaffold = self.scaffold_fn() if self.scaffold_fn else None
File "/home/alireza/Desktop/sharif_uni/RA/simclr/model.py", line 164, in scaffold_fn
for v in tf.global_variables(FLAGS.variable_schema)})
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 3128, in global_variables
return ops.get_collection(ops.GraphKeys.GLOBAL_VARIABLES, scope)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6377, in get_collection
return get_default_graph().get_collection(key, scope)
File "/home/alireza/Desktop/sharif_uni/RA/simclr/myenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4027, in get_collection
regex = re.compile(scope)
File "/usr/lib/python3.6/re.py", line 233, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.6/re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 616, in _parse
source.tell() - here + len(this))
sre_constants.error: nothing to repeat at position 0
"""
How could I resolve the issue? thanks in advance.