Alvin-Zeng / PGCN

Graph Convolutional Networks for Temporal Action Localization (ICCV2019)
319 stars 66 forks source link

Normalization and image size for I3D feature extraction #39

Open arc144 opened 3 years ago

arc144 commented 3 years ago

Hi, first of all congratulations on your work..

I want to use your work in a real pipeline where I need to run all networks in a series fashion. For this, I need to first extract the features with the I3D model. From here and the paper I saw that you extract the features in a sliding windows manner with blocks of 64 frames and a stride of 8, is that correct?

Furthermore, I couldn't find any information about the frame size and normalization prior to feeding into the I3D network. The original repo does not say anything about that. Here, I'm using the following pre-processing:

img = cv2.resize(img, (224, 224))
img = img[:, :, ::-1] # BGR2RGB
img = img / 127.5
img = img - 1

However, the results are different if I use my feature and yours. Could you help me with this issue?

Kind regards,