Refactor Simple Convective Aggregation Index (SCAI)

leifdenby commented 2 years ago

[x] refactor code to make cloudmetrics.objects.metrics.scai.scai(object_labels=...)
[ ] implement all tests for figure 2 in Tobin et al
- [ ] 2a implemented but test not passing, incorrect value
- [ ] 2b implemented but test not passing, incorrect value
- [ ] 2c
- [ ] 2d

NOTE: example from Figure 4a in White et all (COP) is different from example from Figure 2a - looks like the authors copied the example incorrectly.

martinjanssens commented 2 years ago

I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in scai.py, named scai2; does that function serve a purpose? And if not, should we remove it before merging this and just have a single scai function?

martinjanssens commented 2 years ago

On a semi-related note: There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!

leifdenby commented 2 years ago

I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in scai.py, named scai2; does that function serve a purpose? And if not, should we remove it before merging this and just have a single scai function?

Thank you! Yes, absolutely! I'll do that. Sorry for the mess. That as me trying to debug what as going wrong...

There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!

Yes, I've been thinking about that too. I couldn't work out from the definition of SCAI if it's possible for the pixel resolution to somehow be factored out (so that SCAI for different choices of resolution are simply related), but it doesn't seem to be (even though the number itself is non-dimensional). I like the way you've made the default be dx=1. I guess the main concern would be that if the value of dx and L are of very different magnitude then the range of SCAI values is reduced (and thereby reducing the resolution in the measurement with SCAI)?

People tend to just look at relative SCAI values no and not absolute values, right? Or rather: do you know how one should compare SCAI values for two masks with different resolution but the same data?

I.e. how would we get the following masks to have the same SCAI values? (at least my intuition says that the should have the same SCAI value, but maybe I'm wrong...)

I'm going to add a test for this and see what it gives :D

leifdenby commented 2 years ago

@martinjanssens I've added a test for resolution-doubling and I'm confused because to the same SCAI value it seems to be opposite to what I'd intuitively expect. In the example where I've created a zoomed copy of the mask (so that each pixel in the zoomed example is half the size of that in the original) I appear to have to double the resolution, not halve it, to get the same SCAI value. https://github.com/cloudsci/cloudmetrics/pull/42/commits/3d1453870d079a56b0779b992b50a2417442a274#diff-82f87aac44e8909eeb356deac9900ee3788adb946231ccaa4a46a53e8f14f8c3R41 I'm a I being an idiot?

martinjanssens commented 2 years ago

Great, I'm really happy SCAI has you confused, it has had me confused since I first looked at it! Thanks a bunch for adding this resolution sensitivity test.

I think what's going on with the resolution doubling is the following. If I understand the authors' thought process correctly, SCAI has two components, N/Nmax and D0/L, which are multiplied.

D0/L tries to measure the typical cluster size relative to a typical domain scale L. In your resolution doubling test, D0 stays the same for dx -> dx/2 and n -> 2n (with n representing field.shape[0] = field.shape[1]), as you would expect.
N/Nmax tries to measure the amount of separate cloud clusters in the scene (N) relative to the max number of possible clusters (Nmax). Nmax would be proportional to n*dx. When doubling the size of the field, Nmax will thus increase by a factor 4, independent of dx or L, and require D0 to increase by the same factor for SCAI to remain constant. I think this is what has us confused.

To me, it seems like the issue here is with the formulation of SCAI itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the same L, dx and n over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adapt SCAI in our formulation in such a way that it no longer relies on n, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?

leifdenby commented 2 years ago

To me, it seems like the issue here is with the formulation of SCAI itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the same L, dx and n over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adapt SCAI in our formulation in such a way that it no longer relies on n, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?

Thanks for this detailed explanation @martinjanssens !

My gut feel is that we should keep our implementation as close to what was published as possible, and add a note to indicate that SCAI isn't resolution independent. That's what I've done and I've also added the same note in the test. Could you have a look and let me know what you think? Thank you!

leifdenby commented 2 years ago

TODO

[x] add changelog, indicating that we have added a test for checking the changes of SCAI with resolution

martinjanssens commented 2 years ago

🥳

cloudsci / cloudmetrics

Refactor Simple Convective Aggregation Index (SCAI) #42