Open hmeine opened 4 years ago
Definitely sounds interesting. Agree on using lists moving forward instead of fixed number of variables. It would be great if you'd like to help with this, let me know.
Not sure how much I can really help with it, but let's go the next step and decide about the API:
I wonder if one should really carry along the full RF along all dimensions (including channels + batch dimension)? I did not do that in my earlier implementation, but that was based on a higher level description of the architecture. When looking at the low-level TF ops, I am afraid that for instance keras uses temporary NHWC <-> NCHW conversions (e.g., in order to support max pooling on the CPU), and it might not be very clean to just ignore them. Or should we?
Then, the exsting API only takes an input and output node name. Is it possible to infer the dimensionality from that? (Should we allow to optionally specify the n-dimensional output box from which to compute the receptive field?)
Regarding NHWC x NCHW, IIRC the op itself would have enough information to figure it out, so I think we wouldn't need any further information.
Regarding carrying full RF across all dimensions: agree, that seems like a nice idea and it would naturally end up making the 3D case easier. I think the batch dimension would probably always have an "RF size" of 1 (after all, it's just batching) -- but no strong opinion if it should be included or removed.
Adding my support for this issue as a potential user. I also use 3D CNNs and would love to be able to use this package with them.
Similar to #3, I would like to apply this code to networks with 3D convolutions. The math is exactly the same, of course, but one needs to collect outputs along three dimensions instead of just two. Maybe I would even use lists instead of a fixed number of variables with _x/_y postfixes. (Actually, I did write similar code in the past, which used small ndarrays for the RF bounding box + strides.)