Closed jalaxy33 closed 4 years ago
Hi~
# Resize video feature to specific temporal scale.
# a direct implementation
def resizePoolFeature(data,feature_save_file,feature_dim=200,\
temporal_scale=512,num_sample=3):
# first resize the length to num_sample*resize_len,
temporal_scale = temporal_scale * num_sample
originalSize=len(data)
if originalSize==1:
data=np.reshape(data,[-1])
return np.stack([data]*temporal_scale)
x=np.array(range(originalSize))
f=interp1d(x,data,axis=0)
x_new=[i*float(originalSize-1)/(temporal_scale-1) for i in range(temporal_scale)]
y_new=f(x_new)
# result length is resize_len
result = np.zeros((temporal_scale / num_sample, feature_dim))
# then calculate the mean of every num_sample feature
for i in range(temporal_scale / num_sample):
result[i] = np.mean(y_new[i:i+num_sample,:], axis=0)
np.save(feature_save_file, result)
# a more complicated implementation
def poolData(data,video_frame,video_second,feature_save_file,sample_step=8,\
feature_dim=200,temporal_scale=512,num_sample=3,pool_type="mean"):
# temporal_scale is the resized temporal scale
# feat dimeansion is the dimension of input feature
# num_sample is the sample times for each temporal location
# pool_type is the method for pooling, using mean or max
# each feature vector corresponding sample_step frames
# thus feature frames is sample_step * length of data
feature_frame=len(data)*sample_step
# corrected_second is the length the corresponding length of feature sequences
corrected_second=float(feature_frame)/video_frame*video_second
fps=float(video_frame)/video_second
st=sample_step/fps
if len(data)==1:
video_feature=np.stack([data]*temporal_scale)
video_feature=np.reshape(video_feature,[temporal_scale,feature_dim])
return video_feature
# x is the temporal location corresponding to each location in feature sequence
x=[st/2+ii*st for ii in range(len(data))]
f=interp1d(x,data,axis=0)
video_feature=[]
zero_sample=np.zeros(feature_dim)
tmp_anchor_xmin=[1.0/temporal_scale*i for i in range(temporal_scale)]
tmp_anchor_xmax=[1.0/temporal_scale*i for i in range(1,temporal_scale+1)]
for idx in range(temporal_scale):
xmin=max(x[0]+0.0001,tmp_anchor_xmin[idx]*corrected_second)
xmax=min(x[-1]-0.0001,tmp_anchor_xmax[idx]*corrected_second)
if xmax<x[0]:
video_feature.append(zero_sample)
continue
if xmin>x[-1]:
video_feature.append(zero_sample)
continue
plen=(xmax-xmin)/(num_sample-1)
x_new=[xmin+plen*ii for ii in range(num_sample)]
y_new=f(x_new)
if pool_type=="mean":
y_new=np.mean(y_new,axis=0)
elif pool_type=="max":
y_new=np.max(y_new,axis=0)
video_feature.append(y_new)
video_feature=np.stack(video_feature)
np.save(feature_save_file, video_feature)
Thanks for your thoughtful help! I'll try them.
Thanks for your help. I have successfully run dssad on my own dataset last week. I am sorry that I forgot to close this issue in time. So I close this now.
Thanks for your help. I have successfully run dssad on my own dataset last week. I am sorry that I forgot to close this issue in time. So I close this now.
Hi, I have also run dssad on my own dataset, I want to recognize six categories, but the model only identified two of them, these two types of actions lasted a long time(3s-20s), while the general duration of undetected actions was only about 1s. In the final calculation of AP, only two types of AP are calculated. Have you ever had this problem?
Glad you solved it @jadada ! Hi @mrlihellohorld. It seems reasonable if the videos are too short to recognize. Could you try the two methods I metioned above to deal with the short videos?
Hi, @HYPJUDY . I try to use this code on my own dataset with a large amount of short videos. After the frame extraction, many videos produce very few frames and one of them is merely 8 frames to extreme, which is much fewer than the required frame number in the code. If I don't get it wrong, the required minimum frame number is 128. So, my question is, how should I adjust the code to the short videos with few frame number? It will be extremely helpful to point out specifically which places to be modified in the codes. Thanks!