OCHA-DAP / hdx-ckan

A repo for HDX's configurations and extensions to CKAN
Other
75 stars 24 forks source link

Add option to disable preview for a resource or dataset #1716

Open davidmegginson opened 9 years ago

davidmegginson commented 9 years ago

I've been discussing a requirement that @JavierTeran has about our policy against including personal information in HDX.

Background

  1. Our policy is not to include personal data (names, email addresses, phone numbers) in HDX.
  2. We can't (and shouldn't) control the contents of externally-hosted datasets, such as the CSV files from the Interaction API.
  3. However, since users can preview those datasets on HDX resource pages (via AJAX and the DataProxy), we could be perceived as including that data in HDX itself.

Proposed action

Add a flag to HDX datasets to disable in-browser preview.

If there were a single flag that the data team could set to disable preview for specific cases, then the data team could set that flag on a case-by-case basis. That approach would still allow us to point to third-party datasets that may contain personal information, but prevent us from displaying that information ourselves.

Alternative (rejected) options

We considered importing static copies of live via ScraperWiki, while filtering out the columns of concern. We rejected that option for four reasons:

A second alternative would be to build a filtering proxy. That would address the first problem (stale data), but the other three would still exist, and the development and ongoing maintenance effort would be much higher (since we'd be operating and hosting another web application).

cjhendrix commented 9 years ago

Some questions:

Also, one comment: while this would diminish the perception that the data comes from HDX, users would still be downloading the data from a button on our site, so the perception may still be that it comes from us.

davidmegginson commented 9 years ago

Based on my discussion with @JavierTeran , I think these minimum steps would satisfy the requirement:

As for the direct-download link, we could address that concern by adding a warning icon beside all data links that point outside our domain, with some popup text (if you click or mouseover) warning that the data is hosted elsewhere, and may be unavailable or change without notice. But as of right now, we haven't received a requirement to do so (as far as I know).

cjhendrix commented 9 years ago

Another question @JavierTeran @davidmegginson Given the current planned implementation, the URL for the preview would still show the preview. However, the preview button would not be visible in the interface. Would that meet your requirement?

davidmegginson commented 9 years ago

I'll try to restate that into business implications for @JavierTeran , so that he can decide:

  1. For a dataset/resource with preview disabled, HDX will not show a "Preview" button.
  2. However, if a user knows the correct URL patterns (e.g. if she's a CKAN power user), she could still manually enter a URL that would cause HDX to show a preview of the data, and she could share that link with others (via email, etc.) to show that we are displaying the personal info in HDX.

I think the risk of the second point is low, but it's not quite zero. So from @cjhendrix and @amcguire62 we need an estimate of the level of effort to avoid the risk of no. 2, then @ochadataproject (Sarah) and @JavierTeran can decide whether eliminating a small risk justifies the extra effort.

danmihaila commented 8 years ago

@davidmegginson is this still an issue and if yes what should be the priority? I think we can just make the resource not to be "zipped shapefile" or similar.

cjhendrix commented 8 years ago

It's still an issue. And the current work around is to change the format, but this breaks the discoverability of the data. As for priority, that's not up to me or @davidmegginson . There are lots of small usability issues like this one that fall through the gap between the bigger issues that @amcguire62 manages and the smaller issues that @danmihaila manages. It's that "gap" that we've talked about a couple of times, but haven't really closed.

davidmegginson commented 8 years ago

I'm going to pass this on to @amcguire62 , because it requires managing a series of business decisions first (from Sarah and @JavierTeran , perhaps), before we can make a final technical decision.