AugustT / BioHack_iNat

Biohackaton 2021 - Team investigating recorder behaviour on iNaturalist
GNU General Public License v3.0
1 stars 0 forks source link

Determining location by km grid square based on a national grid system is not widely applicable #4

Open simonrolph opened 2 years ago

simonrolph commented 2 years ago

In each case for running in a different country we'd need to find their grid system - what about if we wanted to run across national boundaries etc.

Is there a better way of grouping records by location which can be more easily applied in other settings?

What is a site?

AugustT commented 2 years ago

I'm sure there are spatial clustering algorithms that could be used

simonrolph commented 2 years ago

This looks like a fairly straightforward clustering approach: https://gis.stackexchange.com/questions/17638/clustering-spatial-data-in-r

Getting the threshold is going to be a bit of visualising and seeing what seems sensible.

But a question is do sites need to be consistent across observers? eg. can we apply clustering on the observations from each of the observers? Or do we need to apply the clustering on the whole set?

simonrolph commented 2 years ago

This would be good to feed into the core package

simonrolph commented 2 years ago

Tried this one on all records (not just one observer) and it tried to make a 300gb vector... need a more efficient way of doing this

This looks like a fairly straightforward clustering approach: https://gis.stackexchange.com/questions/17638/clustering-spatial-data-in-r

Getting the threshold is going to be a bit of visualising and seeing what seems sensible.

But a question is do sites need to be consistent across observers? eg. can we apply clustering on the observations from each of the observers? Or do we need to apply the clustering on the whole set?

AugustT commented 2 years ago

This sounds like a good idea. Maybe we can chat in more detail about this