Calculating and visualizing effective (empirical) receptive fields of network models

The receptive field of network layers (and of the whole network) informs us how much spatial context information is available to the network when predicting class probabilities. Since high (and anisotropic) spatial context-awareness is especially important when dealing with large high-res anisotropic 3D images, we should have a tool to calculate and visualize receptive fields, so we can evaluate different network architectures better.

The "theoretical" receptive field that e.g. ELEKTRONN2 calculates automatically (where it is called "fov") has been shown to be misleading:

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Object Detectors Emerge in Deep Scene CNNs (section 3.2)
ParseNet: Looking Wider to See Better (section 3.1 and figure 2).

2. and 3. suggest that the effective receptive field can be empirically computed and visualized by feeding crafted inputs into the network and analysing the relationship between input pixels and network activations. The method proposed in 2. looks rather effortful, whereas the approach described in 3. (section 3.1) seems to be easier to implement. There is also a project (4.) https://github.com/fornaxai/receptivefield which aims to calculate effective receptive fields with an even simpler approach (for TensorFlow and Keras models). We can't directly use this inside of elektronn3, because it is GPL-licensed, though, so writing our own implementation for PyTorch is probably the best way to go.

ELEKTRONN / elektronn3

Calculating and visualizing effective (empirical) receptive fields of network models #14