Closed leifdenby closed 2 years ago
I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in scai.py
, named scai2
; does that function serve a purpose? And if not, should we remove it before merging this and just have a single scai
function?
On a semi-related note: There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!
I think I have gotten the testing working here too, it is in this branch. It needed the correct pixel size (in km) to function properly. The examples still don't correspond to Tobin et al. (2012), but they do correspond to White et al. (2018) now. There is still a second function in
scai.py
, namedscai2
; does that function serve a purpose? And if not, should we remove it before merging this and just have a singlescai
function?
Thank you! Yes, absolutely! I'll do that. Sorry for the mess. That as me trying to debug what as going wrong...
There are several metrics whose results assume a unit of size in "pixels" at the moment (e.g. most object-based metrics). We might need to consider offering the option of giving those physical units (like km). I had already started thinking about that a while ago (see #1), but never got around to fixing it. I'd be curious to hear what you think, @leifdenby!
Yes, I've been thinking about that too. I couldn't work out from the definition of SCAI if it's possible for the pixel resolution to somehow be factored out (so that SCAI for different choices of resolution are simply related), but it doesn't seem to be (even though the number itself is non-dimensional). I like the way you've made the default be dx=1
. I guess the main concern would be that if the value of dx
and L
are of very different magnitude then the range of SCAI values is reduced (and thereby reducing the resolution in the measurement with SCAI)?
People tend to just look at relative SCAI values no and not absolute values, right? Or rather: do you know how one should compare SCAI values for two masks with different resolution but the same data?
I.e. how would we get the following masks to have the same SCAI values? (at least my intuition says that the should have the same SCAI value, but maybe I'm wrong...)
00000
00110
10000
0000000000
0000000000
0000111100
0000111100
1100000000
1100000000
I'm going to add a test for this and see what it gives :D
@martinjanssens I've added a test for resolution-doubling and I'm confused because to the same SCAI value it seems to be opposite to what I'd intuitively expect. In the example where I've created a zoomed copy of the mask (so that each pixel in the zoomed example is half the size of that in the original) I appear to have to double the resolution, not halve it, to get the same SCAI value. https://github.com/cloudsci/cloudmetrics/pull/42/commits/3d1453870d079a56b0779b992b50a2417442a274#diff-82f87aac44e8909eeb356deac9900ee3788adb946231ccaa4a46a53e8f14f8c3R41 I'm a I being an idiot?
Great, I'm really happy SCAI has you confused, it has had me confused since I first looked at it! Thanks a bunch for adding this resolution sensitivity test.
I think what's going on with the resolution doubling is the following. If I understand the authors' thought process correctly, SCAI has two components, N/Nmax
and D0/L
, which are multiplied.
D0/L
tries to measure the typical cluster size relative to a typical domain scale L
. In your resolution doubling test, D0
stays the same for dx -> dx/2
and n -> 2n
(with n
representing field.shape[0] = field.shape[1]
), as you would expect.N/Nmax
tries to measure the amount of separate cloud clusters in the scene (N
) relative to the max number of possible clusters (Nmax
). Nmax
would be proportional to n*dx
. When doubling the size of the field, Nmax
will thus increase by a factor 4, independent of dx
or L
, and require D0
to increase by the same factor for SCAI
to remain constant. I think this is what has us confused.To me, it seems like the issue here is with the formulation of SCAI
itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the same L
, dx
and n
over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adapt SCAI
in our formulation in such a way that it no longer relies on n
, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?
To me, it seems like the issue here is with the formulation of
SCAI
itself. It just doesn't seem to be designed to be consistent w.r.t. resolution. I think that doesn't matter in many practical applications, because many datasets have the sameL
,dx
andn
over all scenes in the dataset and then the metric is consistent across that dataset (as you mention). But the resolution sensitivity is definitely a drawback. We could mention that in the description, or adaptSCAI
in our formulation in such a way that it no longer relies onn
, diverging from the original proposal. I'm fine either way, I think. What's your opinion, @leifdenby ?
Thanks for this detailed explanation @martinjanssens !
My gut feel is that we should keep our implementation as close to what was published as possible, and add a note to indicate that SCAI isn't resolution independent. That's what I've done and I've also added the same note in the test. Could you have a look and let me know what you think? Thank you!
TODO
🥳
cloudmetrics.objects.metrics.scai.scai(object_labels=...)
NOTE: example from Figure 4a in White et all (COP) is different from example from Figure 2a - looks like the authors copied the example incorrectly.