vector_to_random_points: How to distribute points among geometries?

Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.

https://processes.openeo.org

Apache License 2.0

48 stars 16 forks source link

vector_to_random_points: How to distribute points among geometries? #345

Closed m-mohr closed 1 year ago

m-mohr commented 2 years ago

Origin: https://github.com/Open-EO/openeo-processes/pull/315#discussion_r823617381

@clausmichele wrote:

Another question: how would you distribute the points among the geometries if total_count = 100 and geom_count = null ?

My idea was to compute an estimate of the total area of all the geometries, compute the density total_area / total_count and then compute how many samples each geometry based on their area. Too complex?

@soxofaan wrote:

That question is even more general: how would you distribute the points among the geometries if geom_count > total_count / number_of_geometries or geom_count=null ?

clausmichele commented 2 years ago

Actually I don't understand @soxofaan question! Could you rephrase?

soxofaan commented 2 years ago

What I mean that the problem applies to a larger extent than this case you mention:

if total_count = 100 and geom_count = null

And it's even a larger extent than the case I replied with.

Whenever you don't hit the geom_count ceiling for all geometries, you have degrees of freedom to distribute the sample points, and we have specify how we constrain this freedom:

distribute based on area
distribute equally approximately (e.g. if you have to distribute 17 samples over three geometries: some get 5, some get 6)
distribute equally exactly (e.g. if you have to distribute (max) 17 samples over three geometries: all get 5)

clausmichele commented 2 years ago

Alright, yes, there are multiple possibilities to distribute the samples. I was thinking about the area-based distribution since the other scenarios are similar in setting geom_count to a specific required value.

m-mohr commented 2 years ago

Do we need to define this or do we leave this intentionally open in the process description as the points are generated randomly anyway?

m-mohr commented 2 years ago

@soxofaan @clausmichele Thoughts? Otherwise, I'd probably close this.

soxofaan commented 2 years ago

I have no strong opinion here. I think it's up to more ML-minded people to decide in this case.

Area based distribution is probably more intuitive and comparable to regular sampling.

clausmichele commented 2 years ago

I was the one proposing the area based distribution so I still vote for that.

m-mohr commented 1 year ago

Let's wait for implementations, maybe it's not an issue anymore.