gbif / doc-sensitive-species-best-practices

This document aims to describe current best practices for dealing with primary occurrence data for sensitive species and provide guidance on how to make as freely data available as possible and as protected as necessary.
https://doi.org/10.15468/doc-5jp4-5g10
Other
1 stars 1 forks source link

Generalization vs Randomization #18

Open robemery opened 3 years ago

robemery commented 3 years ago

Hi,

This is the first time I've used GitHub to comment so forgive me if I've done something wrong.

The GeoReference Guides are fantastic tools. I have had to deal with so much messy data it is really useful to have guides to refer people to.

I was surprised that randomisation of sensitive locations was vehemently opposed.

Previously I have collected a lot of biosecurity reports from citizen scientists. We felt obliged to obfuscate locations because sometimes people included photographs of their homes.

The problem with generalisation by removing decimal places is that many of the dots end up on top of each other so there is no indication of how much data had been contributed. Google Maps has that point expansion/rose effect which works nicely but that's no help in a static PDF or printout.

Anyway, thanks for the excellent work.

Regards,

Rob

ArthurChapman commented 3 years ago

Thanks Rob for your comment

There are always advantages and disadvantages for whatever method one chooses for obfuscation. After the various workshops, an online survey and open forums it was agreed overwhelmingly that randomisatioon introduced far more problems than did generalisation.

An advantage of generalization (where as you say all the points overlap) it does indicate generalization and that these may not be the actual location - or at least they are within the range of coordinate_Uncertainty and precision. You are not changing the data - just reporting it at a smaller scale. Randomisation is changing the data and without extra information being made available, it is misleading as the interpretation is that this randomised location is the true location. Of course it all boils down to fitness for use and what the user wants to do with the data.

Whatever method it is important that the original data be retained - any obfuscation should only be for publication of the data, and documenting what has been done is essential.

Thanks again for your comments. They are valuable.

Arthur

robemery commented 3 years ago

Thanks for getting back to me Athur.

This has been an area of interest to me for many years, even before GPS when we digitized our insect reference collection hand-written labels. More recently we have used smartphone apps to record research data as well as community engagement. I attended one of Debra's workshops in Santa Barbara a few years ago and I don't know how many times I have handed out copies of your guide to people pleading with them to collect georeference data properly.

I thought you might be interested to see how we used the MyPestGuide reporting app to do a quick survey to demonstrate freedom from a citrus disease. https://www.google.com/maps/d/edit?mid=1cUsT5B3cv8cs6Me0ClwMUB1njUK9pE5i&usp=sharing

We hold the original coordinates in our database, but sometimes obfuscate only as the points are sent to a map, especially if we are dealing with sensitive species or pest insects and weeds on private properties. We try to use large placemarks with blurred edges to reinforce that the point is not accurate. Google Maps resizes the points as you zoom in so by moving a point to protect privacy we end up putting it on someone else's property, which is even worse!

Regards,

Rob

On Thu, Jan 14, 2021 at 5:00 AM Arthur Chapman notifications@github.com wrote:

Thanks Rob for your comment

There are always advantages and disadvantages for whatever method one chooses for obfuscation. After the various workshops, an online survey and open forums it was agreed overwhelmingly that randomisatioon introduced far more problems than did generalisation.

An advantage of generalization (where as you say all the points overlap) it does indicate generalization and that these may not be the actual location - or at least they are within the range of coordinate_Uncertainty and precision. You are not changing the data - just reporting it at a smaller scale. Randomisation is changing the data and without extra information being made available, it is misleading as the interpretation is that this randomised location is the true location. Of course it all boils down to fitness for use and what the user wants to do with the data.

Whatever method it is important that the original data be retained - any obfuscation should only be for publication of the data, and documenting what has been done is essential.

Thanks again for your comments. They are valuable.

Arthur

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gbif/doc-sensitive-species-best-practices/issues/18#issuecomment-759736619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASONT7IEVGDDHPVBF3UAJT3SZYCWZANCNFSM4WAUR4AA .