about the feature extractor architecture

VIPL-SLP / VAC_CSLR

Visual Alignment Constraint for Continuous Sign Language Recognition. ( ICCV 2021)

https://openaccess.thecvf.com/content/ICCV2021/html/Min_Visual_Alignment_Constraint_for_Continuous_Sign_Language_Recognition_ICCV_2021_paper.html

Apache License 2.0

116 stars 19 forks source link

about the feature extractor architecture #36

Closed fransisca25 closed 1 year ago

fransisca25 commented 1 year ago

Hi, I am really sorry to ask question here, but this is something important for my research. I really want to know more about the delta t from the frame wise features. There are intersections between the delta ts. Could you explain how long are the intersections between those delta ts? Or maybe you could mention the code about the delta t, so I can check it? Thank you. Screenshot from 2023-05-08 14-49-30

ycmin95 commented 1 year ago

Hi, thanks for your attention to our work and please feel free to post any question here, the delta t here is implemented by conv1d on the temporal dimention with stride=1, and the implementation can be found here. The ablation results about the length of delta t can be found in the Table 4 of the main paper.

Hope can help you~

fransisca25 commented 1 year ago

This really helps me. Thank you so much for answering my question!

atonyo11 commented 6 months ago

Hi @ycmin95 , @fransisca25, can you explain how to calculate the delta t? For example C5-P2-C5-P2? Thank you in advance!

ycmin95 commented 5 months ago

Hi @atonyo11 , The equation for the receptive field calculation is RF_i = (RF_i-1) * stride + Ksize

For C5-P2-C5-P2:

-P2: (1-1)2+2=2 -C5-P2: (2-1)1+5=6 -P2-C5-P2: (6-1)2+2=12 C5-P2-C5-P2: (12-1)1+5=16

Hope this can help you understand~

atonyo11 commented 5 months ago

@ycmin95 oh, I got it. Thank you very much!

atonyo11 commented 2 months ago

@ycmin95 I understand that the delta t is the receptive field. In your paper context, does that mean the delta t is number of frames?