Input shape for the FSM Model

TRI-ML / vidar

Other

573 stars 67 forks source link

Input shape for the FSM Model #11

Closed jenhungh closed 2 years ago

jenhungh commented 2 years ago

After reading the FSM paper and looking at the code, I am still a little bit confused about the input shape for the FSM model. We need to input 6 synchronized images in order to compute the spatio-temporal loss. So, should the shape of the inputs be (B, 6, 3, H, W) or (B, 3, H, W)? If the shape is (B, 3, H, W), then the batch size should be 6, but how could we make sure the images are synchronized with data shuffling? Thanks.

VitorGuizilini-TRI commented 2 years ago

Hi, thank you for the interest in our work. The shape of the inputs (images) to the model would be (B,N,3,H,W), meaning that for each batch sample there are N images of resolution (H,W), each corresponding to one camera. So, even with data shuffling only the samples are shuffled, that doesn't change the images within each one.