Open purvang3 opened 3 years ago
With a bit of digging around I managed to get past the error above.
experiment.py
if __name__ == '__main__':
FLAGS(sys.argv) # <- add this line
flags.mark_flag_as_required('config')
platform.main(Experiment, sys.argv[1:])
Add the following lines in the definition of get_config()
, in experiment.py
:
config.save_checkpoint_interval = 60
config.eval_specific_checkpoint_dir = ''
config.checkpoint_dir = '/path/' # <- add this (modify /path/ appropriately)
config.train_checkpoint_all_hosts = True # <- and this
return config
experiment.py
with --config
argument, as follows:
python nfnets/experiment.py --config nfnets/experiment.py
The published version of deepmind/jaxline is outdated, perhaps?
PS: Even with this workaround, training halts with TypeError
, but that's yet another issue...
First of all, thank you for great publish nfnets. I have started deeging deep in to implementation, where I have some questions.
Unfortunately I am not able to run experiment.py. I am getting following error. I am running on just one gpu for testing.
when I run test.py using fake data, it is working without any error.
Thank you