Open leifdenby opened 2 years ago
@martinjanssens I'm a bit confused about what the RDF function should be returning. I guess what it is calculating internally is the probability distribution function as a density (i.e. probability of funding a mask at a given distance). Currently that is as a function of radial distance r
, calling the distribution f(r)
. The scalar values that this function can return are then different modes of this distribution, is that right?
Is there any reason why the distance couldn't be given in pixels? I realise that fractional indices don't make a lot of sense since the pixel indices are integers, but that shouldn't matter for the distribution, no? What I am suggesting is that the grid resolution becomes an optional argument and if provided then it is assumed to be in meters and all the returned values scaled by this number. Does that make sense?
Yes, I think your analysis is completely right and your proposal makes sense. As you say f(r)
is in principle dimensionless in and of itself, which shouldn't change if dx
is given in pixels or physical units (as long as L=n*dx
is consistently updated). It might be useful to be able to specify a variable dr
, say, to this function, as that's essentially the bin width used to construct f(r)
, but again that could be in pixels. The metrics I computed from f(r)
were simply is maximum, the difference between the maximum and the minimum, and its integral over r
; the first of these proved the most expressive for the fields I was looking at. It is, I think, then already a non-dimensional number, which wouldn't change with dx
anyway.
Thanks for your feedback @martinjanssens!
I've spent this morning working on this and here are a few thoughts:
S
(domain size) argument to the rdf functions. I think the way it works is that for periodic domains we have copied the objects that wrap the boundary (?) and so the size of the domain is actually smaller than the shape of the object_labels
2D array. But I'm a bit confused about the normalisation constant in pair_correlation_2d
(https://github.com/cloudsci/cloudmetrics/pull/43/files#diff-9ad196211d6f5181f667aef506ed9fcf4396dc697121d952f007681da1e9123aR63). Could you maybe explain how using periodic vs non-periodic domains should effect RDF calculation in your mind?All of this is making me think that maybe we should push RDF to v0.3.0
. I think it will take quite a bit more experimentation and refactoring to get it to a state where it's quite ready. What do you think?
All of this is making me think that maybe we should push RDF to
v0.3.0
. I think it will take quite a bit more experimentation and refactoring to get it to a state where it's quite ready. What do you think?
I'm so slow I hadn't even realised we'd already agreed to do this :) sorry
Great, thanks for looking at this in so much detail! I've had a look at the code, and I think you're right to be confused regarding point 1. I think this was my thought:
pair_correlation_2d
would be called inside a function (e.g. compute_rdf
) where a mask
with shape domain_shape
and a bunch of labelled objects' positions (pos
) have already been computed.if periodic_domain==True
, compute_rdf
would first contain a similar routine as we have for iorg
, where we i) move centroids outside the original domain back into that domain and ii) set domain_size
to half the expanded shape.pair_correlation_2
with the new pos
and domain_size
; it can then be thought of to operate in the original domain again for both periodic and open BCs, and I think the normalisations make sense thenDo you agree? :)
Using an existing package is a great idea, as is using your Poisson disc sampling as a reference case! I can take a look at where the differences with our implementation come from, but they may very well be discretisation artefacts related to binning the RDF, which I think you can theoretically correct for (and we don't).
And finally, yes let's pick this up for the next version, I agree we'll probably be playing around with this for a while still!