SalesforceAIResearch / uni2ts

[ICML2024] Unified Training of Universal Time Series Forecasting Transformers
Apache License 2.0
782 stars 79 forks source link

Error when using finetuning command #91

Open linyuan13 opened 1 month ago

linyuan13 commented 1 month ago

Describe the bug Error executing job with overrides: ['run_name=first_run', 'model=moirai_1.0_R_small', 'data=etth1', 'val_data=etth1'] Error in call to target 'huggingface_hub.hub_mixin.ModelHubMixin.from_pretrained': TypeError("MoiraiModule.init() missing 7 required positional arguments: 'distr_output', 'd_model', 'num_layers', 'patch_sizes', 'max_seq_len', 'attn_dropout_p', and 'dropout_p'") full_key: model.module

I followed the process exactly, but an error occurred when I used the command to make fine adjustments at the last step

gorold commented 1 month ago

Hi, could you edit the bug report according to the template? It's quite hard to understand what is the error from just the above.

linyuan13 commented 1 month ago

Thank you very much for your reply. I have solved this problem. But there is a new problem. It stops early during fine-tuning, and the MSE and other effects are not good during verification. The following is the early stopping log Loading weights from local directory GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs [2024-08-02 08:25:59,964][datasets][INFO] - PyTorch version 2.3.1 available. [2024-08-02 08:25:59,964][datasets][INFO] - JAX version 0.4.30 available. Seed set to 1 [rank: 0] Seed set to 1 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2 Loading weights from local directory [2024-08-02 08:26:04,015][datasets][INFO] - PyTorch version 2.3.1 available. [2024-08-02 08:26:04,015][datasets][INFO] - JAX version 0.4.30 available. [rank: 1] Seed set to 1 [rank: 1] Seed set to 1 Initial izing distributed: GLOBAL_RANK: 1, MEMBER: 2/2 ------------------------------------------------------------------------------------------------ distributed_backend=nccl All distributed processes registered. Starting with 2 processes ------------------------------------------------------------------------------------------------ /anaconda3/envs/uni/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:652: Checkpoint directory uni2ts-main/outputs/finetune/moirai_1.0_R_small/etth1/finetune1/checkpoints exists and is not empty. LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] | Name | Type | Params | Mode -------------------------------------------------- 0 | module | MoiraiMod ule | 13.8 M | train ------------------------------------------------ 13.8 M Trainable params 0 Non-trainable params 13.8 M Total params 55.310 Total estimated model params size (MB) Epoch 0: | val/PackedMSELoss=11.40, val/Pack[rank: 0] Metric val/PackedNLLLoss improved. New best score: 2.069 [rank: 1] Metric val/PackedNLLLoss improved. New best score: 2.158 Epoch 3: | val/PackedNLLLoss=3.900, val/PackedMSELoss=11.90, val/Pack[rank: 0] Monitored metric val/PackedNLLLoss did not improve in the last 3 records. Best score: 2.069. Signaling Trainer to stop. [rank: 1] Monitored metric val/PackedNLLLoss did not improve in the last 3 records. Best score: 2.158. Signaling Trainer to stop. Epoch 3: | .py:254: UserWarning: resource_tracker: There appear to be 22 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' anaconda3/envs/uni/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 22 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' The parameters are consistent with what you provided, and the GPU model is A100