google-research / timesfm

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/
Apache License 2.0
3.03k stars 227 forks source link

So many errors I'm unsure where to begin #23

Closed sdmorrey closed 1 month ago

sdmorrey commented 1 month ago

I'm using lightning.ai 's hosting service. I installed with the CPU on a CPU instance, but it's throwing errors for all kinds of stuff. Do you guys have a complete, functioning example somewhere?

Perhaps a docker image?

Processing dataframe with multiple processes.
2024-05-13 19:05:50.022496: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-05-13 19:05:50.022513: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-05-13 19:05:50.030911: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-05-13 19:05:50.031158: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
Multiprocessing context has already been set.
Constructing model weights.
Multiprocessing context has already been set.
Constructing model weights.
Multiprocessing context has already been set.
Constructing model weights.
Multiprocessing context has already been set.
Constructing model weights.
Constructed model weights in 5.59 seconds.
Restoring checkpoint from ./checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
Constructed model weights in 5.61 seconds.
Restoring checkpoint from ./checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
Constructed model weights in 5.77 seconds.
Restoring checkpoint from ./checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
Constructed model weights in 5.75 seconds.
Restoring checkpoint from ./checkpoints.
WARNING:absl:No registered CheckpointArgs found for handler type: <class 'paxml.checkpoints.FlaxCheckpointHandler'>
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by May 1st, 2024.
WARNING:absl:train_state_unpadded_shape_dtype_struct is not provided. We assume `train_state` is unpadded.
/commands/python: line 37: 230787 Killed                  "$@"
ERROR:absl:For checkpoint version > 1.0, we require users to provide                                                      
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.21 seconds.
Jitting decoding.
ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.29 seconds.
Jitting decoding.
⚡ main ~/timesfm ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.53 seconds.
Jitting decoding.
ERROR:absl:For checkpoint version > 1.0, we require users to provide
          `train_state_unpadded_shape_dtype_struct` during checkpoint
          saving/restoring, to avoid potential silent bugs when loading
          checkpoints to incompatible unpadded shapes of TrainState.
Restored checkpoint in 3.72 seconds.
Jitting decoding.
Jitted decoding in 35.71 seconds.
.....
[503 rows x 6 columns]
            ds      Open      High       Low     Close    Volume  unique_id
1   2014-09-28 -0.143312 -0.056853 -0.026526 -0.054260  0.190367          1
2   2014-10-05 -0.055555 -0.113639 -0.227167 -0.150249  0.481255          1
3   2014-10-12 -0.149999 -0.022109  0.045849  0.181083  0.233122          1
4   2014-10-19  0.179569  0.075699  0.219252  0.029050 -0.540083          1
5   2014-10-26  0.029927 -0.046277 -0.070535 -0.089443 -0.275396          1
..         ...       ...       ...       ...       ...       ...        ...
498 2024-04-07  0.060972 -0.005376 -0.027930 -0.027632  0.060569         11
499 2024-04-14 -0.027630  0.019249 -0.056394 -0.052245  0.256362         11
500 2024-04-21 -0.052231 -0.080268 -0.020810 -0.012353 -0.105307         11
501 2024-04-28 -0.012230  0.005313  0.046492 -0.027930 -0.317417         11
502 2024-05-05 -0.028171 -0.037639 -0.094024 -0.086910 -0.076448         11

[502 rows x 7 columns]
Processing dataframe with multiple processes.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/teamspace/studios/this_studio/timesfm/btc_predict.py", line 63, in <module>
    forecast_df = tfm.forecast_on_df(
  File "/teamspace/studios/this_studio/timesfm/src/timesfm.py", line 568, in forecast_on_df
    with multiprocessing.Pool(processes=num_jobs) as pool:
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/pool.py", line 215, in __init__
    self._repopulate_pool()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static
    w.start()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
sdmorrey commented 1 month ago

I did eventually get this thing to run. It's very particular. Also I discovered that lightning.ai provides the conda environment and manages it. You can't change it once launched so you need to upload the conda template as a template and start a new environment.

They need better instructions but the errors above are user error.