cloudsci / cloudmetrics

Toolkit for computing 15+ metrics characterising 2D cloud patterns
16 stars 8 forks source link

Metrics are not independent of domain size or resolution #1

Open martinjanssens opened 4 years ago

martinjanssens commented 4 years ago

These metrics currently depend on the domain size (DS) or resolution (R):

We might want to consider making them less dependent on these aspects to facilitate more direct comparisons between organisation in scenes of different scales. However, such comparisons might not always make sense, because the processes underlying the formation of patterns on different scales might be totally different.

leifdenby commented 2 years ago

Thanks for starting this discussion and continuing it on #42 @martinjanssens! My gut feeling is to try and non-dimensionalise functions (i.e. keep the in pixel dependent), but include in the docstring how each scales with the pixel resolution and add tests to ensure that this scaling works. This reduces the number of arguments to the functions and the user can then scale metrics if they want to be able to compare across resolutions. This makes sense to me because metrics them selves tend to be non-dimensional, but maybe it isn't possible for all the metrics, I don't know... Will be interesting to see for SCAI :)

martinjanssens commented 2 years ago

I agree, giving non-dimensional outputs per default makes sense. Concretely:

leifdenby commented 2 years ago
  • Statistical reductions, COP, cloud fraction, fractal dimension, orientation, iorg, open sky, CSD: Nothing needs to be done as far as I can see, unless users might want to visualise the distributions underlying iorg, CSD and fractal dimension in physical units. I could imagine that being interesting, but probably isn't a high priority thing for us.

Agreed

  • Maximum/mean length scale, mean perimeter: Currently calculated in pixels, can be made non-dimensional by dividing through domain characteristic size (e.g. np.sqrt(mask.shape**2)). If you know the resolution, you can then easily get these numbers in physical units by multiplying by the characteristic size and the resolution. We should probably document this.

Yes, good point. Maybe also need a note about for example the characteristic size not making a lot of sense for non-isotropic grids :)

  • Spectral metrics: Currently offer support for adding the resolution as an input (dx=1 per default) and thus return metrics in physical units where possible. Where these are length scales (spectral slope, spectral length), we could again normalise by the characteristic size, and document how to get the metric in physical units if the user is interested.

That's good (that it is possible to provide the resolution). I remember a lot of these spectral things in numpy being confusing because nothing have dimensions. Maybe we can do something like having dx=None be an option and otherwise (if the dx is a float) we interpret that as a distance in meters. How does that sound?

  • RDF: The distribution is returned in 1/length units, so the integral of the distribution is already dimensionless; the max and max-min difference that we currently have are not, but could presumably be normalised with some reference quantity, since the distribution itself doesn't depend on the resolution very explicitly. For users that want to see the actual distribution in actual physical units, we can give the option to pass dx to this function too. Something to consider for

This one I'm a bit confused about, but I'll ask on the pull-request https://github.com/cloudsci/cloudmetrics/pull/43

SCAI: See discussion in

Yes, I think we might have this sorted now!

martinjanssens commented 2 years ago

Great, thanks for the detailed reply!

The spectral metrics (as they're implemented in our code now) may be a little complicated to adapt appropriately. Now, they require dx to work, though when it's set to 1 it effectively means that the lengths used are of unit 1. E.g. the wavenumbers are physical quantities that require a 1/length unit:

# Wavenumbers
N = np.min(cloud_scalar.shape)
L = dx * N
k1d = 2 * np.pi / L * np.arange(1, N // 2 + 1)

So I think having dx=1 as the internal default would make sense. If you like having dx=None as the default argument to the functions, we could just convert it to 1 if the default is passed? Again, this would all probably would require proper documentation, which I'm happy to write :)