SRA2 / SPELL

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
MIT License
65 stars 9 forks source link

error with TSM and the consecutive face-crops length(defalut 11) #1

Closed hxhao2000 closed 2 years ago

hxhao2000 commented 2 years ago

Hi, I am trying to run your code, and I use my own dataset for training and testing, but I found some possible bugs in models_stage1_tsm.py.

  1. I guess this '3' should be '3*clip_length', where 'clip_length' is the length of face-crops image

  2. This rgb_stack_size shoude be equal with clip_length? image

When rgb_stack_size is 11 (default), the program always throws an exception here image I have no idea to solve it, if you can help me, I'd really appreciate it.

kylemin commented 2 years ago

Hi,

No, that is not a bug. It should remain as 3. You don't need to change anything. Please make sure that your input sizes are the same as below: v: (B, rgb_stack_size, 3, H_v, W_v) a: (B, 1, H_a, W_a)

Thank you, Kyle

hxhao2000 commented 2 years ago

Thanks! After resizing the input, it works well.