MIV-XJTU / ARTrack

Apache License 2.0
228 stars 33 forks source link

ValueError: The number of weights does not match the population #41

Closed MrNeoBlue closed 7 months ago

MrNeoBlue commented 7 months ago

First of all, great thank to your fantastic job.

I followed your tutorial to start training on LaSOT only, however after loading pre-trained [MAE ViT-Base weights] form (https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth), it seems that the network couldn't match the code.

Do I need to trace down the network .py file to find the difference?

Im looking forward to ur reply! Best regards!

checkpoints will be saved to /home/ubuntu/Workspace/ARTrack/output/checkpoints move_data True No matching checkpoint file found move_data True No matching checkpoint file found Training crashed at epoch 1 Traceback for the error! Traceback (most recent call last): File "/home/ubuntu/Workspace/ARTrack/lib/train/../../lib/train/trainers/base_trainer.py", line 85, in train self.train_epoch() File "/home/ubuntu/Workspace/ARTrack/lib/train/../../lib/train/trainers/ltr_trainer.py", line 131, in train_epoch self.cycle_dataset(loader) File "/home/ubuntu/Workspace/ARTrack/lib/train/../../lib/train/trainers/ltr_trainer.py", line 75, in cycle_dataset for i, data in enumerate(loader, 1): File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/ubuntu/Workspace/ARTrack/lib/train/../../lib/train/data/sampler.py", line 98, in getitem return self.getitem() File "/home/ubuntu/Workspace/ARTrack/lib/train/../../lib/train/data/sampler.py", line 108, in getitem dataset = random.choices(self.datasets, self.p_datasets)[0] File "/home/ubuntu/anaconda3/envs/artrack/lib/python3.9/random.py", line 499, in choices raise ValueError('The number of weights does not match the population') ValueError: The number of weights does not match the population

ARTrackV2 commented 7 months ago

It looks like your problem is occur in the dataset or dataset_loader but not the backbone pretrained. I think your config maybe wrong, if you can give me your yaml, I can try to fix it.

MrNeoBlue commented 7 months ago

Thx for ur fast reply~

I modify the artrack_256_lasot.yaml from artrack_256_full.yaml. artrack_256_lasot.zip also the local.py under train/admin is uploaded as follows: local.zip

I have several other questions. Q1: Is track_seq the ARTrackV2 model in your paper? So u recommend directly start with track_seq or whichever? Q2: During the evaluation, every sequence need a network re-building process to run? I asked so because the test process is slow. I assume the 60-epoch-checkpoint is need for buidling the network and also for loading the net. So I fill the field MODEL.PRETRAIN_PTH with pth.tar file instead of MAE.pth

ARTrackV2 commented 7 months ago

You should change the yaml in DATA: Train: DATASETS_RATIO include only two datasets, which means: TRAIN: DATASETS_NAME:

Moreover, the ARTrackV2's code is not include in this github now, I will push my code as soon as possible.

Change the filed of MODE.PRETRAIN_PTH is correct. i think it is a better way to reload the checkpoints then mine.

ARTrackV2 commented 7 months ago

The codebase which only include ARTrackV1, it includes two separate training stages, you should train in artrack then artrack_seq.

MrNeoBlue commented 7 months ago

Oh, now I get the difference of artrack and artrack_seq. Excellent explanation BTW, Im new to tracker algorithm, what's the difference between got10k_train and got10k_vottrain. I downloaded the train\val\test sets, and no archive or file named vottrain or votval. Would u pls do me a favor to sort things out?

ARTrackV2 commented 7 months ago

There is no need for u to download anything else because the vottrian is a protocol for vot datasets, the dataset_split file is in the codebase ./lib/train/data_specs. The vot is a traditional challange in Single Object Tracking, for the fair comparison, if you try to evaluate your tracker in GOT-10k, you should train on GOT10ktrain solely, moreover, if you want to test your trakcer in LaSOT, LaSOT{ext}, TrackingNet and whatever else datasets, you should train your tracker in GOT10k_vottrain with other datasets.