So, what's the best model in your experiment?

pmixer commented 6 years ago

Moreover, We observe that the tracking performance of saved models in different epochs varies considerably, therefore, you may want to evaluate a few more models instead of just picking the model in the last epoch.

I tested the code, trained and evaluated the tracker and the performance on OTB 100 using the last checkpoint only reached 0.527 for overlap AUC. Could u pls tell which checkpoint did you used for the reported 0.58+ performance? Thx in advance @bilylee

bilylee commented 6 years ago

0.527 is way below the expectation. It should be 0.55~0.58 in my experience. I evaluated all epochs and reported the best result, which is within the last 30 epochs.

How about the performance of the pretrained model? is it the same as reported? Did you follow exactly the same steps in the training section?

pmixer commented 6 years ago

@bilylee Thx for the response, I did follow the given steps to train the model, the problem may be caused by preprocessing VID procedure on our server, could u please share the size of the training set folder in your experiment to help me figure out it? I only got one folder of ~20GB size for training set but the matlab version seems giving out a much larger training set after preprocessing(30+GB). Thx in advance and sorry for the delayed reply? (Ps, 博士现在是在亚研实习么？)

pmixer commented 6 years ago

ps, 0.527 is the model generated by python experiments/SiamFC-3s-color-scratch.py

bilylee commented 6 years ago

Hi, you can check the size of the training data by the following script:

import os.path as osp
import pickle
with open('data/train_imdb.pickle') as f:
    imdb = pickle.load(f)

n_frames = 0
for v in imdb['videos']:
    n_frames += len(v)

total_file_size = 0 # Byte
for v in imdb['videos']:
    for p in v:
        total_file_size += osp.getsize(p)
total_file_size /= (1024 * 1024 * 1024) # GB

print('Num of videos: {}'.format(imdb['n_videos'])) # 8309
print('Num of frames: {}'.format(n_frames)) # 1877463
print('Total file size: {} GB'.format(total_file_size)) # 19 GB

It is normal to have a smaller curated dataset (~20G) than the one obtained via the Matlab version (~53G). The reason is that I have removed all *.z.jpg since these exemplar images can be extracted on-the-fly from corresponding *.x.jpg.

Did you check the tracking performance of the pretrained model? Did you accidentally evaluated the model on tb50?

我之前在亚研实习，但已经 check out 了 : )

pmixer commented 6 years ago

感谢博士，数据是 ok 的，还在排查是不是 from scratch train 的过程有问题或者训练 epoch 数不够，使用的默认参数～ps，果然(不然SA-SiamFC那篇不可能在 release 出来之前用上这个实现:cry:)

Angel-Jia commented 6 years ago

@PeterHuang2015 求问下如何使用训练好的模型？我用命令

python experiments/SiamFC-3s-color-scratch.py

已经训练完毕，但是执行

python scripts/run_tracking.py

的时候报错。说找不到文件：

Logs/SiamFC/track_model_checkpoints/SiamFC-3s-color-pretrained

pmixer commented 6 years ago

@Mabinogiysk ?说找不到文件就是路径有问题啊。。。绝对路径比较稳:red_car:抱歉回复比较迟

pmixer commented 6 years ago

Thx to @yangkang, got 0.579 on otb100 which is close the reported performance from @bilylee , run 65 epochs, which means 432250 iterations as there are 53200 training pair and each iteration process 8 pairs which are called a batch.

To those who care this issue.BTW, the performance was about 0.56 after 70000 iterations but 0.53 after 50 epochs...The devil may live in this hyperparameters like 0.176 for windows influence and scaling factor, strongly advise focusing on these parts rather than simply introduce in a new convnet.

pmixer commented 6 years ago

Hi @bilylee do you know this paper: https://arxiv.org/abs/1802.08817? It claims that using lr=0.1 to train for 25 epochs and lr=0.01 for 5 epochs would generate 0.584 AUC score based on your code. I run 25 epochs (with lr=0.1 lr_decay=1), and modified experiment script to change epoch num to 30 and lr=0.01 but did not get the reported result on OTB100, may you help to check whether I'm setting the lr in the right way or not?(BTW, tried modifying the network model, removed grouping operation and enlarged the conv5 layer to 512 channels, trained 30 epochs, got about 0.55 AUC on OTB100:cry:, sad, it's really difficult to determine which checkpoint should I use, feeling like being in a Casinos)

bilylee commented 6 years ago

Hi,

This paper uses vanilla SGD optimizer without momentum.

I typically just evaluate all the epochs and choose the best one. Even though the performance varies in different epochs, the best performance seems to be stable.

The SiamFC is already over-fitting during the second half of training epochs. I guess it is normal to have worse performance with larger neural network.

pmixer commented 6 years ago

@bilylee Yes, it reported using vanilla SGD optimizer, but the interesting point is that when I requested code, the author sent me the link to this project which caused me thought that you involved in this work:cry:, and repeatedly asking we the performance is not consistent with the reported results, sorry for that:trollface:

bilylee / SiamFC-TensorFlow

So, what's the best model in your experiment? #10