trying to undestand the batch_size

Thanks for your interest in our work.

Yes, the "batch size" of the output y is the number of ROIs. I used quotes because the name of batch size is arbitrary. If we agree that the name of the outer dimension (leftmost in y.shape) of a tensor is called batch size, we can remove the quotes 😉.
The code does not maintain the "batch size" of the input, because the implementation is generic enough to support different ROIs per slice on the "batch size" axis of the input.
In general, the output size is (num_rois, channels,) + output_size where output_size in the number of bins on height and width, respectively. You can see it on this line. In your case, I would expect (4, 4, 5, 7). Please confirm that.
In case I didn't understand your question, please write code to replicate it. It's faster to understand your question, or detect any bug. In particular, it's unclear why you are expecting (2, 4, 4, 5, 7). Lemme know if that's your question.

No need to apologize 👍

escorciav / roi_pooling