Wouldn't have the issue of the recordedBy column from the GBIF data not always being the same as inat username (which loses us ~100 users)
Depends how well some dyplr filtering would cope with this many rows.
Steps could be:
Crop by extent of country (for speed)
Do some better matching using coords2country() function.
Group by observer, summarise by record count
Filter by target number of records (100?)
Sample remaining usernames
Wouldn't have the issue of the recordedBy column from the GBIF data not always being the same as inat username (which loses us ~100 users)
Depends how well some dyplr filtering would cope with this many rows.
Steps could be:
Crop by extent of country (for speed) Do some better matching using
coords2country()
function. Group by observer, summarise by record count Filter by target number of records (100?) Sample remaining usernames