BiologicalRecordsCentre / ABLE

Assessing ButterfLies in Europe project repository
2 stars 3 forks source link

Hide counts #468

Closed chrisvanswaay closed 4 months ago

chrisvanswaay commented 2 years ago

@DavidRoy @kazlauskis I regularly do counts on places I don't want to be publicly known, e.g. because there are sensitive species. Now all data is openly available. Would it be possible to add a switch in the app (either in the settings for always, or at the start of the count for once) so I can hide such counts for other people? I would not mind if they are in the downloads, as the downloads are only for your own records as a recorder, or for someone who has the rights to download all data anyway (only a few people).

DavidRoy commented 2 years ago

@chrisvanswaay I agree this is useful. We have functionality for sensitive locations. Would you want the records completely hidden or the location to be blurred (if so, what scale to blur to)

chrisvanswaay commented 2 years ago

I think it is best to make them completely hidden. In my experience blurring doesn't help a lot. Would be good if also old counts can be hidden via the website.

DavidRoy commented 2 years ago

@kazlauskis - can you a flag to mark samples as 'sensitive' to the app surveys for timed counts (single species and multi-species). It sits best on the 'Additional details' page. @JimBacon can you advise on warehouse setup, e.g. how Karolis posts the information. Also, can you update the sample-editing page to show this field and allow it to be edited. https://butterfly-monitoring.net/mydata/samples/edit?sample_id=19917579. I assume the reports will then blur the samples and occurrences?

kazlauskis commented 2 years ago

@JimBacon should we use the occurrence:confidential attribute?

JimBacon commented 2 years ago

There are 3 fields used by the database to manage visibility of records. The comments in the database about them are as follows:

I don't think confidential is appropriate in this context, @kazlauskis

Hiding records entirely, as @chrisvanswaay suggests, is not a straightforward option. I can see us doing that by creating a separate survey but that would be a lot of work compared to the readily available option of blurring.

sensitivity_precision is for marking occurrences (e.g. instances of a taxon which needs protecting) whereas privacy_precision is for marking samples (e.g. when the site needs protecting). Since Chris says it is the 'place' he doesn't want known and David is requesting that the sample is protected, these both point us to using the prvacy_precision field. This means that occurrences of very common species recorded at the site will also be blurred.

@DavidRoy, by setting this field to a suitable value (1000, 10000, something else?) then properly configured reports will then blur the outputs. The reports should be using the fields public_geom, public_entered_sref, output_sref, location_name from the cache tables. See https://indicia-docs.readthedocs.io/en/latest/developing/locality-data.html. Alternatively, if obtaining records from the ElasticSearch index, then appropriate filters should be in place. I'm not sure what those are just yet but a starting point for documentation can be found at https://indicia-docs.readthedocs.io/en/latest/site-building/iform/prebuilt-forms/dynamic-elasticsearch.html. Once we start marking samples as private, then we can add some test records and review their visibility, updating reports and maps as necessary.

JimBacon commented 2 years ago

Thinking about it a bit more, we might be able to customise the reporting pages to not show records where the privacy_precision has been set. However, because that is not the generally understood meaning, they might be blurred rather than hidden if shared to another website.

chrisvanswaay commented 2 years ago

I'm not sure I completely understand all, but it looks like @JimBacon says that hiding is difficult, and blurring is much easier. That's a pity, but it is how it is. In that case we should set a large radius plus blur all the other observations of that visit. Anyway the data should be in good precision for scientific use.

JimBacon commented 1 year ago

Re-reading this today, it seems to me that while the anticipated output from Indicia is to show private and sensitive records as blurred, it should be easy enough to re-write reports to hide them instead.

chrisvanswaay commented 11 months ago

@kazlauskis @JimBacon @DavidRoy @JurrienVanDeijk I bring this up again as we have been using the species specific 15min counts for the monitoring of a rare moth in NL. As this is a very much wanted collectors item we want to hide the counts of this species (Lemonia dumi in NL). Adding such a thing to the app would make it possible to arrange this in the field. For existing counts it would be very much appreciated if the counts of this species in NL could only be made visible to the recorder and the coordinator, not to the general public, as e.g. in https://butterfly-monitoring.net/elastic/all-records

DavidRoy commented 11 months ago

@kazlauskis @JimBacon a priority to resolve over the winter. The transect setup already has a flag for 'Sensitive' with help text "Check the Sensitive box if the landowner/manager does not wish for the site location to be made public.". Can we mirror this for 15 minute counts, with a flag added to the count if needed?

CrisSevilleja commented 11 months ago

thank @DavidRoy basically it will be to include the ¨sensitive¨ option for the 15min counts. For transects, I understand this is done when creating a transect on the website (already present), so no need to include the sensitive option for transects on the app.

JimBacon commented 7 months ago

Over at https://github.com/BiologicalRecordsCentre/ABLE/issues/612#issuecomment-1954629772 John says:

This needs consideration of whether we just update the reports so that the existing sensitivity precision field causes records to be blocked (accepting that this behaviour isn't the default if the records are viewed elsewhere) or addition a confidential flag to the locations table so the core behaviour would be to treat records as fully confidential if the location is confidential.

My thought is that, for 15-minute counts, there is not a linked location so a locations.confidential field would not work but a sample.confidential field could.

Above David tells us that for transects, EBMS solves the problem by having a location attribute which indicates the transect should be hidden. We could do the same with a sample attribute for 15-minute counts. This would not require any changes to the Indicia core. However, if we see this as a generic feature which is of wide application then it may be worth implementing in the core.

johnvanbreda commented 6 months ago

Here's a task list for the addition of a samples.confidential field which will, by default, block reporting on the confidential samples. Blurring samples can already be handled via samples.privacy_precision:

  1. Add a samples.confidential field (samples and cache_samples_functional).
  2. Add samples.confidential to Elasticsearch mappings, plus Logstash reports and config. This needs to be done for both occurrences and samples extraction.
  3. Set samples.confidential to default to false in an upgrade script (using slow script tag).
  4. There are existing trigger functions that ensure the training flag stays in sync between samples and occurrences, so it makes sense to re-use these: a. Alter set_occurrence_to_training_from_sample trigger function to also set occurrence to confidential if sample is confidential. b. Also update trigger function set_sample_occurrences_to_training to handle confidential in the same way.
  5. Update cache builder so if sample is confidential, so is the cache_occurrences_functional record (shouldn’t matter as triggers will do this, but a catch-all).
  6. Add confidential filter to standard params for samples (both for PostgreSQL and ES code).
  7. Ensure that default for both PostgreSQL and Elasticsearch reporting is that confidential samples are excluded.

I'll send a quote via email.

JimBacon commented 6 months ago

The description for occurrences.confidential says

Flag managed by the dataset administrator. The confidential flag relates to the need to control communications around a record rather then simply an indicator that a record is sensitive (which should be done via the sensitivity_precision field) so this flag prevents notifications about this record being sent to the recorder.

Will samples.confidential also be managed by the dataset adminisitrator and block notifications being sent to the recorder?

If the aim is to just hide samples from public view would it be easier to use a special value for privacy_precision?

JimBacon commented 6 months ago

Actually, to answer my own question, if samples.confidential is cascaded to set occurrences_confidential then it would prevent notifications being sent to the recorder. "Is that a desirable side effect?" should be my question.

johnvanbreda commented 6 months ago

Good spot, thanks Jim. So an alternative approach might be to use samples.privacy_precision=0 as a special value which hides the sample (at least from reports that use the standard parameters filtering). The tasks would be as follows:

  1. Change the default filtering applied for a standard parameters report in reporting mode so that privacy_precision=0 are filtered out (both Elasticsearch and PostgreSQL reports) in all scenarios where the public blurred view of records is being shown. I.e. this would apply to all sharing modes (e.g. reporting, downloads) excluding verification and when viewing my records.
  2. As a catch all, change the cache builder code so that the geometry created for public viewing (using function get_output_sref) is set to 10km (or 100km?) blur if privacy_precision=0.
  3. Update the database comment on samples.privacy_precision to document the meaning of value 0.

Have I missed anything @JimBacon?

JimBacon commented 6 months ago

I don't think of anything else except to wonder if sensitivity_precision would benefit from having the same alteration at the same time, but that would be changing the scope of the issue.

kazlauskis commented 5 months ago

We have now added the Sample.privacy_precision=0 as a toggle to the app.

@JimBacon, you can close the ticket if there isn't anything to be done on the website.

johnvanbreda commented 4 months ago

Closing as now done.