TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Apache License 2.0
100 stars 4 forks source link

AttributeError: 'MetaLoader' object has no attribute 'dataset' #20

Open Richar-Du opened 1 month ago

Richar-Du commented 1 month ago

When I try to resume a checkpoint from last training, it raises an error:

[rank12]: Traceback (most recent call last):
[rank12]:   File "/cpfs/29f69eb5e2e60f26/user/GPT/pretrain/mm_intern/duyifan/ST-LLM-temp/stllm/train/train_hf.py", line 278, in <module>
[rank12]:     train()
[rank12]:   File "/cpfs/29f69eb5e2e60f26/user/GPT/pretrain/mm_intern/duyifan/ST-LLM-temp/stllm/train/train_hf.py", line 267, in train
[rank12]:     trainer.train(resume_from_checkpoint=True)
[rank12]:   File "/cpfs/29f69eb5e2e60f26/user/GPT/pretrain/mm_intern/duyifan/miniconda3/envs/stllm/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
[rank12]:     return inner_training_loop(
[rank12]:   File "/cpfs/29f69eb5e2e60f26/user/GPT/pretrain/mm_intern/duyifan/miniconda3/envs/stllm/lib/python3.10/site-packages/transformers/trainer.py", line 1893, in _inner_training_loop
[rank12]:     epoch_iterator = skip_first_batches(epoch_iterator, steps_trained_in_current_epoch)
[rank12]:   File "/cpfs/29f69eb5e2e60f26/user/GPT/pretrain/mm_intern/duyifan/miniconda3/envs/stllm/lib/python3.10/site-packages/accelerate/data_loader.py", line 1086, in skip_first_batches
[rank12]:     dataset = dataloader.dataset
[rank12]: AttributeError: 'MetaLoader' object has no attribute 'dataset'
farewellthree commented 1 month ago

Sorry, our code currently only supports resume from epoch, not resume from step. Resuming from step cannot retrieve the historical order of the dataset.