groups variable in AggregateBackbone class

AggregateBackbone concatenates multi-scale feature maps extracted from different groups of images; within each group, the feature maps undergo temporal max pooling to compute one multi-scale feature map for the group.

[[0,1,2,3,4,5,6,7]] has only one group so it computes temporal max pooling across all input images.

[[0,1,2,3],[4,5,6,7]] has two groups, so it pools the features of images 0-3 and 4-7 and then concatenates the result. This is useful if there is four images before an "event" and four images after and the model should compare them.

The pretrained models are only designed to work in the setting where all the features go through temporal max pooling. So for different numbers of images, you would set the groups to [[0, 1, ..., N]] where N is the number of images. If you have a task where time series is particularly important then you can keep the Swin backbone and apply it on each image, but then use a different architecture to process the extracted features over time.

allenai / satlaspretrain_models

groups variable in AggregateBackbone class #8