Metrics are not independent of domain size or resolution

martinjanssens commented 4 years ago

These metrics currently depend on the domain size (DS) or resolution (R):

Moments of CWP (DS, R). Currently computed over pixels in domain. Should use integrals over area of domain (requiring resolution as input) and be normalised by domain size.
Maximum/mean length scale, mean perimeter, cloud number (DS, R). Currently computed in pixels. Should convert these to length (using resolution) and normalise with domain size (in units of length).
Cloud size distribution/Fourier metrics (DS, R). Binning depends on how wide the gap between pixel and domain length is.
CWP var ratio (R). Mesoscale cutoff explicitly assumes that pixel length is 1 km, and that the cutoff of 16 km therefore corresponds to 16 pixels. Should, of course, all be based on true lengths.

We might want to consider making them less dependent on these aspects to facilitate more direct comparisons between organisation in scenes of different scales. However, such comparisons might not always make sense, because the processes underlying the formation of patterns on different scales might be totally different.

leifdenby commented 2 years ago

Thanks for starting this discussion and continuing it on #42 @martinjanssens! My gut feeling is to try and non-dimensionalise functions (i.e. keep the in pixel dependent), but include in the docstring how each scales with the pixel resolution and add tests to ensure that this scaling works. This reduces the number of arguments to the functions and the user can then scale metrics if they want to be able to compare across resolutions. This makes sense to me because metrics them selves tend to be non-dimensional, but maybe it isn't possible for all the metrics, I don't know... Will be interesting to see for SCAI :)

martinjanssens commented 2 years ago

I agree, giving non-dimensional outputs per default makes sense. Concretely:

Statistical reductions, COP, cloud fraction, fractal dimension, orientation, iorg, open sky, CSD: Nothing needs to be done as far as I can see, unless users might want to visualise the distributions underlying iorg, CSD and fractal dimension in physical units. I could imagine that being interesting, but probably isn't a high priority thing for us.
Maximum/mean length scale, mean perimeter: Currently calculated in pixels, can be made non-dimensional by dividing through domain characteristic size (e.g. np.sqrt(mask.shape**2)). If you know the resolution, you can then easily get these numbers in physical units by multiplying by the characteristic size and the resolution. We should probably document this.
Spectral metrics: Currently offer support for adding the resolution as an input (dx=1 per default) and thus return metrics in physical units where possible. Where these are length scales (spectral slope, spectral length), we could again normalise by the characteristic size, and document how to get the metric in physical units if the user is interested.
RDF: The distribution is returned in 1/length units, so the integral of the distribution is already dimensionless; the max and max-min difference that we currently have are not, but could presumably be normalised with some reference quantity, since the distribution itself doesn't depend on the resolution very explicitly. For users that want to see the actual distribution in actual physical units, we can give the option to pass dx to this function too. Something to consider for #43.
SCAI: See discussion in #42 .

leifdenby commented 2 years ago

Statistical reductions, COP, cloud fraction, fractal dimension, orientation, iorg, open sky, CSD: Nothing needs to be done as far as I can see, unless users might want to visualise the distributions underlying iorg, CSD and fractal dimension in physical units. I could imagine that being interesting, but probably isn't a high priority thing for us.

Agreed

Maximum/mean length scale, mean perimeter: Currently calculated in pixels, can be made non-dimensional by dividing through domain characteristic size (e.g. np.sqrt(mask.shape**2)). If you know the resolution, you can then easily get these numbers in physical units by multiplying by the characteristic size and the resolution. We should probably document this.

Yes, good point. Maybe also need a note about for example the characteristic size not making a lot of sense for non-isotropic grids :)

Spectral metrics: Currently offer support for adding the resolution as an input (dx=1 per default) and thus return metrics in physical units where possible. Where these are length scales (spectral slope, spectral length), we could again normalise by the characteristic size, and document how to get the metric in physical units if the user is interested.

That's good (that it is possible to provide the resolution). I remember a lot of these spectral things in numpy being confusing because nothing have dimensions. Maybe we can do something like having dx=None be an option and otherwise (if the dx is a float) we interpret that as a distance in meters. How does that sound?

RDF: The distribution is returned in 1/length units, so the integral of the distribution is already dimensionless; the max and max-min difference that we currently have are not, but could presumably be normalised with some reference quantity, since the distribution itself doesn't depend on the resolution very explicitly. For users that want to see the actual distribution in actual physical units, we can give the option to pass dx to this function too. Something to consider for

This one I'm a bit confused about, but I'll ask on the pull-request https://github.com/cloudsci/cloudmetrics/pull/43

SCAI: See discussion in

Yes, I think we might have this sorted now!

martinjanssens commented 2 years ago

Great, thanks for the detailed reply!

The spectral metrics (as they're implemented in our code now) may be a little complicated to adapt appropriately. Now, they require dx to work, though when it's set to 1 it effectively means that the lengths used are of unit 1. E.g. the wavenumbers are physical quantities that require a 1/length unit:

# Wavenumbers
N = np.min(cloud_scalar.shape)
L = dx * N
k1d = 2 * np.pi / L * np.arange(1, N // 2 + 1)

So I think having dx=1 as the internal default would make sense. If you like having dx=None as the default argument to the functions, we could just convert it to 1 if the default is passed? Again, this would all probably would require proper documentation, which I'm happy to write :)

cloudsci / cloudmetrics

Metrics are not independent of domain size or resolution #1