All the official pytorch object detections model expect the input as a list of tensors List[torch.Tensor[3, H, W]], while the classification models expect a mini-batch of stacked tensors torch.Tensor[N, 3, H, W].
Object detections model then can do resize in GPU and tolerate different size / channels of images.
We could probably expect that the none-official models will have a wide range of expectations from collate_fn. Rikai should be flexible about that.
All the official pytorch object detections model expect the input as a list of tensors
List[torch.Tensor[3, H, W]]
, while the classification models expect a mini-batch of stacked tensorstorch.Tensor[N, 3, H, W]
.Object detections model then can do resize in GPU and tolerate different size / channels of images.
We could probably expect that the none-official models will have a wide range of expectations from
collate_fn
. Rikai should be flexible about that.