Closed nthhiep closed 3 years ago
This is a specific channel for brox-flow rather than the red channel.
I tried, but this had no use.
Yes.
Thank you for your response. I have another question. In fact, the format of ground-truth tubes in "UCF101v2-GT.pkl" is as follows:
gttubes = {
'parentfolder/videoname': {class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]}
...
'parentfolder/videoname': {class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]}
}
Here, the datasets are single-object? Each video contain only one action? And the class/identification of tubes is the class of videos (or the index of the parent folder's name)? In theory, your model is multi-object tracking, but it is trained by single-object data?
How about the general problem where there are multiple objects or multiple actions of different types in videos? For example: 1) video with 2 people jumping -> need to identify, or need to separate the tube boxes of each one 2) video with one jumping, one walking -> need to classify as normal
In this case, there exists only the class/identification for tubes, not for videos? And _gttubes[label] must be a dictionary of multiple elements, as follows?
gttubes = {
'parentfolder/videoname': { class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]
class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]
...}
...
'parentfolder/videoname': { class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]
class: [
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
...
array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]
...}
}
Many thanks,
Oh, flow-images are represented by HSV format, where the 0-channel means the direction and the 2-channel means the magnitude of the movement? So, when we flip images, we have to flip the direction of object movement. Thanks for the information, I forgot that.
UCF101-24 is a multi-objects dataset but JHMDB-21 is a single-object dataset. (see our gifs)
According to my observation, both datasets are single-action as you declare.
I don't know the generalization performance for multi-actions. And indeed, the community demand for a newly large non-atomic multi-actions/multi-objects action detection dataset.
I checked in UCF101v2-GT.pkl and found that UCF101-24 is not only a single-action but also single-object dataset. In every video, only one object is annotated with box during the video (dispite the video may contain many objects). So, UCF101-24 is a single-object tracking dataset.
We have
len(self._gttubes[v]) = 1 for every video v in in self._gttubes
The action tube can be interrupted, or it is divided in many segments. For example:
'Basketball/v_Basketball_g18_c02': {0: [array([
[ 1., 161., 137., 222., 235.],
[ 2., 161., 137., 222., 235.],
[ 3., 161., 137., 222., 235.],
[ 4., 161., 137., 222., 235.],
[ 5., 161., 137., 222., 235.],
[ 6., 161., 137., 222., 235.],
[ 7., 161., 137., 222., 235.],
[ 8., 161., 137., 222., 235.],
[ 9., 161., 137., 222., 235.],
[ 10., 162., 137., 223., 235.],
[ 11., 162., 137., 223., 235.],
[ 12., 163., 137., 224., 235.],
[ 13., 163., 137., 224., 235.],
[ 14., 163., 137., 224., 235.],
[ 15., 163., 137., 224., 235.],
[ 16., 163., 137., 224., 235.],
[ 17., 163., 137., 224., 235.],
[ 18., 163., 137., 224., 235.],
[ 19., 163., 137., 224., 235.],
[ 20., 163., 137., 224., 235.]], dtype=float32), array([[ 72., 163., 146., 219., 238.],
[ 73., 163., 146., 219., 238.],
[ 74., 163., 146., 219., 238.],
[ 75., 163., 146., 219., 238.],
[ 76., 163., 146., 219., 238.],
[ 77., 163., 146., 219., 238.],
[ 78., 163., 146., 219., 238.],
[ 79., 163., 146., 219., 238.],
[ 80., 163., 146., 219., 238.],
[ 81., 163., 146., 219., 238.],
[ 82., 163., 146., 219., 238.],
[ 83., 163., 146., 219., 238.],
[ 84., 163., 146., 219., 238.],
[ 85., 163., 146., 219., 238.],
[ 86., 163., 146., 219., 238.],
[ 87., 163., 146., 219., 238.],
[ 88., 163., 146., 219., 238.],
[ 89., 163., 146., 219., 238.],
[ 90., 163., 146., 219., 238.],
[ 91., 163., 146., 219., 238.],
[ 92., 163., 146., 219., 238.],
[ 93., 163., 146., 219., 238.],
[ 94., 163., 146., 219., 238.],
[ 95., 163., 146., 219., 238.],
[ 96., 163., 146., 219., 238.],
[ 97., 163., 146., 219., 238.],
[ 98., 163., 146., 219., 238.],
[ 99., 163., 146., 219., 238.],
[100., 163., 146., 219., 238.],
[101., 163., 146., 219., 238.],
[102., 163., 146., 219., 238.]], dtype=float32)]}
There are two tube segments in "Basketball/v_Basketball_g18_c02" video. However, the objects in the tubes are the same. So, the UCF101-24 is is single-object tracking dataset.
gttubes
: dictionary that contains the gt tubes for each video.
Gttubes are dictionaries that associate from each index of label, a list of tubes.
A tube is a numpy array with nframes rows and 5 columns,
len(self._gttubes[v]) = 1
represents single-action rather than single-object.
And try to check len(self._gttubes[v][class_index])
For example, len(pkl['gttubes']['Fencing/v_Fencing_g04_c03'][6])
---> 4
Thank you very much for this example. I'm wrong. I drawn the boxes for Basketball/v_Basketball_g18_c02 and think that it's the same for other videos. Thanks again.
Hi just found this issue. Can the proposed method support multi person multi action in one frame? such as in ava dataset?
Yes, the Center Branch uses the focal loss and can handle multi-label classification.
I have some questions related to flip_test mode.
In "normal_moc_det.py"/preprocess(), line 62, why do you convert the red channel of "flip_data". What does this mean? temp[:, :, 2] = 255 - temp[:, :, 2]
In "normal_moc_det.py"/process() function, why don't you take the average of rgb_mov and rgb_mov_f (as well as flow_mov and flow_mov_f) like heatmap and wh output (lines 88,89, 100,101) ?
rgb_output[1]['mov'], flow_output[1]['mov'] are computed for nothing?
It's the same for stream_moc_det.py. I hope to get your explanation. Thank you for your reply.