Closed Leo63963 closed 3 years ago
This only happens in the beginning of a video, where there are no enough previous frames to be used for propagation. For example, when you are at the second frame and you want to aggregate three frames, you have to repeat the first frame. self.inference_prehm is a cache. Once it is full of the clip_len frames, it will no longer do the repeat.
Thanks for the reply. Yes, I am totally aware of that. For the start of a video, it is the case. However, by debugging, the features in the self.inference_prehm are the same, which means _self.inference_prehm[0] == self.inferenceprehm[1] holds for not only the first few sequences, but all the time, with unknown reason. Could you re-check that? Thanks
If len(self.inference_prehm) == (self.opt.clip_len - 1), it shouldn't get into the loop and should only append the hm once.
Thanks for the reply. Just one follow-up please. On the loss functions below: https://github.com/JialianW/TraDeS/blob/3eafd249ca0f18af8000d5798d4c552a0bd627ec/src/lib/model/losses.py#L115 I could not get why (1-target) here. I am working with your code recently, really admire that. Thanks.
This mainly allows the pixels near around the target to less attend the softmax computation, so as to less penalize them. The reason to do this is that it is too harsh for the network to categorize two adjacent pixles into two groups. The pixels near around the target in the previous frame could also be a part of the target. So we may not want to regard them as other objects or background. I haven't tested the code without this. This implementation is intutitive. Not sure if it will be worse without it.
Actually I still cannot get that. But thanks anyway, the work is really great.
Hi
Just one question about the features used here. It seems that, for the _inferenceprehm here, which stores the previous heatmaps that would be used for feature aggregation in the proposed MFW. The code used as follows:
https://github.com/JialianW/TraDeS/blob/3eafd249ca0f18af8000d5798d4c552a0bd627ec/src/lib/detector.py#L351
https://github.com/JialianW/TraDeS/blob/3eafd249ca0f18af8000d5798d4c552a0bd627ec/src/lib/detector.py#L354
And it looks like that the same heatmaps from the previous frame (t-1) are used twice. And by debug, it has been proved.
I personally think that, in TraDeS pipeline, you would like to use the features and corresponding heatmaps from (t-1) and (t-2), instead of repeated (t-1), for the aggregations of features. However, it looks that it is not.
Just kindly ask, are there some hidden tricks here? Many thanks.