google-research / timesfm

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/
Apache License 2.0
3.02k stars 228 forks source link

Error loading from checkpoint #77

Open break0123 opened 1 week ago

break0123 commented 1 week ago

timesfm-1.0-200m downloaded by huggingface-cli is 777M. It is stored in the docker folder /home/suily/timesfm/model/checkpoints/checkpoint_1100000/state/checkpoint I set the address as

_MODEL_PATH = flags.DEFINE_string(  
    "model_path", "model/checkpoints", "Path to model"
)

and

_MODEL_PATH = flags.DEFINE_string(  
    "model_path", "/home/suily/timesfm/model/checkpoints", "Path to model"
)

The following error has occurred:

Traceback (most recent call last):
  File "/home/suily/timesfm/experiments/extended_benchmarks/run_timesfm.py", line 150, in <module>
    main()
  File "/home/suily/timesfm/experiments/extended_benchmarks/run_timesfm.py", line 108, in main
    tfm.load_from_checkpoint(   # 检查点,加载模型
  File "/home/suily/timesfm/src/timesfm.py", line 269, in load_from_checkpoint
    self._train_state = checkpoints.restore_checkpoint(
  File "/usr/local/lib/python3.10/site-packages/paxml/checkpoints.py", line 225, in restore_checkpoint
    checkpoint_manager = checkpoint_managers.OrbaxCheckpointManager(
  File "/usr/local/lib/python3.10/site-packages/paxml/checkpoint_managers.py", line 379, in __init__
    self._manager = _CheckpointManagerImpl(
  File "/usr/local/lib/python3.10/site-packages/paxml/checkpoint_managers.py", line 216, in __init__
    super().__init__(directory, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 617, in __init__
    self._checkpoints = self._load_checkpoint_infos()
  File "/usr/local/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 1289, in _load_checkpoint_infos
    checkpoint_infos = [futures[step].result() for step in steps]
  File "/usr/local/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 1289, in <listcomp>
    checkpoint_infos = [futures[step].result() for step in steps]
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 1278, in build_checkpoint_info
    step_metadata = self._step_name_format.find_step(self.directory, step)
  File "/usr/local/lib/python3.10/site-packages/orbax/checkpoint/path/step.py", line 341, in find_step
    raise ValueError(
ValueError: No step path found with name=1100000, NameFormat=_StandardNameFormat(step_prefix=None, step_format_fixed_length=None) for step=1100000 under /home/suily/timesfm/model/checkpoints.

Does anyone know the reason for this? The current guess may be that the model is not downloaded fully or is not stored enough. However, it is still 777M after changing several methods. The system I use is Linux.

siriuz42 commented 15 hours ago

Try /home/suily/timesfm/model/

break0123 commented 2 hours ago

Try /home/suily/timesfm/model/

There is no problem with the address, the reason is that the dependent version I downloaded conflicts. Due to special requirements, I downloaded the following versions: Paxml = = 1.2.0 Orbax = = 0.5.3 Pandas = = 2.2.2 This leads to Paxml detection model/checkpoints / 1100000 folder exists, Orbex detection model/checkpoints/checkpoint_1100000 whether there is a model. The problem was solved when I changed to the following version: Orbax = = 0.4.1 Pandas = = 2.0.0