deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.45k stars 499 forks source link

[BUG] FileNotFoundError when train with "tensorboard: true" in input.json #615

Closed tuoping closed 3 years ago

tuoping commented 3 years ago

Summary

When running train with "tensorboard" set to true with Deepmd-kit-2.0, got "FileNotFoundError: [Errno 2] No such file or directory: 'log'"

Complete Error log:

Traceback (most recent call last):
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 342, in main
    train(**dict_args)
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 210, in train
    _do_work(jdata, run_opt)
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 264, in _do_work
    model.train(train_data, valid_data)
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/site-packages/deepmd/train/trainer.py", line 469, in train
    shutil.rmtree(self.tensorboard_log_dir)
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/shutil.py", line 709, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/home/tuopin/soft/deepmd-kit-2.0.0a1/lib/python3.9/shutil.py", line 707, in rmtree
    orig_st = os.lstat(path)
FileNotFoundError: [Errno 2] No such file or directory: 'log'

My input.json:

    "_comment": " traing controls",
    "training" : {
        "training_data": {
            "systems":          ["../data/data_0/", "../data/data_1/", "../data/data_2/"],
            "batch_size":       "auto",
            "_comment":         "that's all"
        },
        "validation_data":{
            "systems":          ["../data/data_3"],
            "batch_size":       1,
            "numb_btch":        3,
            "_comment":         "that's all"
        },
        "numb_steps":   1000000,
        "seed":         1,

        "_comment": " display and restart",
        "_comment": " frequencies counted in batch",
        "disp_file":    "lcurve.out",
        "disp_freq":    10,
        "numb_test":    4,
        "save_freq":    1000,
        "save_ckpt":    "model.ckpt",
        "disp_training":true,
        "time_training":true,
        "tensorboard":  true,
        "tensorboard_log_dir":"log",
        "profiling":    false,
        "profiling_file":"timeline.json",
        "_comment":     "that's all"
    },

Steps to Reproduce

Further Information, Files, and Links

njzjz commented 3 years ago

Fixed in #617.