I've noticed that both in paper and published codes, authors use single resolution for testing performance, however, after fine-tuning RoIHead (in the mean time, Backbone, FPN, RPN are frozen) using BAGS, and test with [(800, 3333), (1000, 3333), (1200, 3333)] (flip is set to True), it's worse than testing with (800, 1333), more specifically, BBox AP drops 0.4 but Mask AP increases 0.3 but still worse than the model trained without BAGS fine-tuning.
I've noticed that both in paper and published codes, authors use single resolution for testing performance, however, after fine-tuning RoIHead (in the mean time, Backbone, FPN, RPN are frozen) using BAGS, and test with [(800, 3333), (1000, 3333), (1200, 3333)] (flip is set to True), it's worse than testing with (800, 1333), more specifically, BBox AP drops 0.4 but Mask AP increases 0.3 but still worse than the model trained without BAGS fine-tuning.