google-research / timesfm

TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
https://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/
Apache License 2.0
3.03k stars 227 forks source link

checkpoint loading issue #7

Open ylq1996 opened 1 month ago

ylq1996 commented 1 month ago

i have downloaded the checkpoint from the provided repo on huggingface. However, when I ran the code, there was an error when loading the checkpoint, 'tfm.load_from_checkpoint('checkpoint')'

ValueError: Dimension to ungroup is not divisible by its index sizes. Group "(np)" expects size 228, but its indices "p" have combined specified size 32. ERROR conda.cli.main_run:execute(124): conda run python /opt/project/test.py failed. (See above for error)

siriuz42 commented 1 month ago

This looks like an issue that happened during jitting the model, and it might be because thecontext_len was not a multiplier of 32. What were the params you used to initialize the model instance?

Also updated README to make this requirement explicit.

ylq1996 commented 1 month ago

I used to set the context_len=100, which leads to this error. After changing context_len=128, it has been solved. However, I encountered a new issue, I stored the checkpoint weight in ['/usr/src/app/checkpoint']. Using this tfm.load_from_checkpoint('/usr/src/app/') will result in ValueError: No checkpoints were found in directory checkpoint_dir=PosixGPath('/usr/src/app'). And i found that there was no step from the code: if step is None: step = checkpoint_manager.latest_step() if step is None: raise ValueError( f'No checkpoints were found in directory {checkpoint_dir=!r}' )

zhaokui001 commented 1 month ago

I used to set the context_len=100, which leads to this error. After changing context_len=128, it has been solved. However, I encountered a new issue, I stored the checkpoint weight in ['/usr/src/app/checkpoint']. Using this tfm.load_from_checkpoint('/usr/src/app/') will result in ValueError: No checkpoints were found in directory checkpoint_dir=PosixGPath('/usr/src/app'). And i found that there was no step from the code: if step is None: step = checkpoint_manager.latest_step() if step is None: raise ValueError( f'No checkpoints were found in directory {checkpoint_dir=!r}' )

Before I met the same problem with you, my "checkpoint" path is[/home/dedong/huggingface/timesfm/checkpoints/checkpoint_1100000 / state], Change the path to[/home/dedong/huggingface/timesfm/checkpoints] can successfully loaded model