JunLi-Galios / CDFL

Code for Weakly Supervised Energy-Based Learning for Action Segmentation (ICCV 2019 Oral)
MIT License
64 stars 7 forks source link

Online learning without sampling from buffer #2

Closed wlin-at closed 2 years ago

wlin-at commented 4 years ago

Hi, thanks for the very refreshing paper. I notice that you followed the online learning pattern of "NeuralNetwork-Viterbi" paper by Richard et al. However, your training loss is designed on one entire video sequence while their cross-entropy loss is on a batch of 21-frame video chunks. I think that is why you did not sample frames from the buffer as they did. I wonder why you still kept this online training pattern. Or have you tried training from all the training videos instead of one video at each iteration? Regards

JunLi-Galios commented 4 years ago

We use one video at each iteration because performing the Viterbi algorithm on a video is slow. If we use all videos in one iteration, it will take a very long time. Anyway, to update more than one videos for each iteration is worth trying.

jszgz commented 4 years ago

@wlin93 Hello, did you run "NeuralNetwork-Viterbi" paper's code? I downloaded the dataset they provided(Breakfast dataset) and ran their code but only got 0.375708 frame accuracy.

jszgz commented 4 years ago

@JunLi-Galios Hello, I noticed that your iteration is 100K while "NeuralNetwork-Viterbi" use 10K, is your loss much harder to minimize? I am running your code and it really taks much time, maybe 3 days for me......

wlin-at commented 4 years ago

@wlin93 Hello, did you run "NeuralNetwork-Viterbi" paper's code? I downloaded the dataset they provided(Breakfast dataset) and ran their code but only got 0.375708 frame accuracy.

Hi, I recall that I could not reproduce the accuracy in the paper either. However, the author claimed that they used C++ implementation instead of the Python code in the repo.

JunLi-Galios commented 4 years ago

@jszgz @wlin93 I think "NeuralNetwork-Viterbi" reports their max accuracy. You may refer to https://arxiv.org/pdf/1904.03116.pdf for the statistical learning of the related work.

JunLi-Galios commented 4 years ago

@jszgz We use more iterations because we don't use a buffer during training. In "NeuralNetwork-Viterbi", the authors use a buffer to store pseudo ground truth in previous iterations and reuse them during training.

jszgz commented 4 years ago

Evaluate 252 video files... frame accuracy: 0.503284

@JunLi-Galios Hello, I ran your code on breakfast dataset and got this accuracy, is it segmentation accuracy or alignment accuracy? If it is segmentation accuracy, then you've done a good job. Besides, would you consider to provide the code for alignment accuracy and other metrics in the paper?

JunLi-Galios commented 4 years ago

@jszgz It's segmentation accuracy that is the main metric for Breakfast dataset. I would provide other metrics shortly.

jszgz commented 4 years ago

@jszgz It's segmentation accuracy that is the main metric for Breakfast dataset. I would provide other metrics shortly.

Extremely thanks.

jszgz commented 4 years ago

@JunLi-Galios Hello, I tried to train GTEA dataset with your code, and get 0.36 frame accuracy after 1600 iterations(since there are only 28 videos, which is 60 times smaller than breakfast, and learning rate is decreased after 60% of the iterations) . Do I need to adjust some parameters or optimizer?

JunLi-Galios commented 4 years ago

@jszgz First, you need to tune the learning rate and the number of total iterations. Second, you need to tune frame_sampling, window_size and step, both in train and inference. The longer the videos are, the larger these hyper-parameters are.

yassersouri commented 4 years ago

@jszgz @wlin93 @JunLi-Galios We have done a reproducibility study and comparison of NNV, CDFL and ISBA methods in our technical report: https://arxiv.org/abs/2005.09743 Here is Table 1 of the paper comparing the reported accuracy with average and standard deviation of the accuracy for 5 trials (over all splits of breakfast)

Model MoF Reported MoF Avg (+- Std)
ISBA 38.4 36.4 (+- 1.0)
NNV 43.0 39.7 (+- 2.4)
CDFL 50.2 48.1 (+- 2.5)
JunLi-Galios commented 4 years ago

@yassersouri Thanks for sharing your work.