While slicing video input, we need to slice the last dimension i.e. people_num. During training the first dimension becomes the batch_size; hence slice [:, :, :, index] becomes wrong. This can be fixed using ellipsis. This error leads to incorrect model training because this will slice the embedding dimension and not the input face dimension.
While slicing video input, we need to slice the last dimension i.e.
people_num
. During training the first dimension becomes thebatch_size
; hence slice [:, :, :, index] becomes wrong. This can be fixed using ellipsis. This error leads to incorrect model training because this will slice the embedding dimension and not the input face dimension.