apertium / apertium-html-tools

Web application providing a fully localised interface for text/website/document translation, analysis and generation powered by Apertium.
http://wiki.apertium.org/wiki/Apertium-html-tools
GNU General Public License v3.0
39 stars 90 forks source link

Determine how html-tools and APY might be affected by (and make them compliant with) the GDPR and other regulations #288

Open roybaer opened 6 years ago

roybaer commented 6 years ago

One thing about the new EU General Data Protection Regulation is that it classifies IP addresses as personal data. Exporting such personal data to other countries without the user's permission might therefore be problematic, especially because an IP address will already be "exported" when the web page makes use of e.g. Google APIs.

The page should also contain the typical easily accessible "privacy policy" and "legal notice" sub-pages. In my opinion, the privacy policy page should then also contain or link to the APY server's privacy policy in addition to the privacy policy of the service provider of the html-tools installation in question. The legal notice should contain whatever information the jurisdiction in question requires (service provider's postal address, E-mail address, phone number or contact form, type of organization, tax ID, person responsible for content etc.).

The interface should probably also put a box around the Apertium-related links and mark them as such, so that nobody mistakes the "about" or "contact" pages for the privacy policy or legal notice.

ftyers commented 6 years ago

Is there any reason we should be sending the user's IP to Google anyway? Can't we proxy these services ? In any case +1 for complying with the relevant legislation and for finding ways to improve our users' privacy.

sushain97 commented 6 years ago

Is there any reason we should be sending the user's IP to Google anyway?

As far as I recall, we're not.

roybaer commented 6 years ago

Well, uBlock Origin tells me that the installation at www.apertium.org connects to these domains: (EDIT: I.e. it asks the user's browser to connect to these domains without asking the user.)

Anything beyond *.apertium.org is problematic. Is there an easy way to at least minimize this?

sushain97 commented 6 years ago

Ah, those are CDNs. No, I'm not OK with removing any of those because they will cause large performance degradations on the site in the context of Apertium's servers slowness. Even if the server were fast, using CDNs is industry standard.

sushain97 commented 6 years ago

However, the site is built to allow fallbacks for most of those links so users can feel free to disable access to them via their browser.

unhammer commented 6 years ago

The fact that we use CDN's should go in the privacy policy, along with the fact that all request are logged.

At least we use Subresource integrity so the third parties hosting the scripts can't alter them without the client treating it as an error. And if the client already has cached the scripts, they shouldn't even make a request to the third party (since these are very common scripts used by many sites, it's fairly likely they have them cached). But a "fresh" browser will make a lot of third-party requests. Apertium.org is not very fast at loading, so there's a trade-off here; I agree with Sushain that we should keep using the CDN's (though we should also verify with tests). (It'd be very cool if one could say something like <script href="./jquery.min.js" same-as="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js" integrity="sha-…"/> and load from our server unless it's cached from Google, but I doubt that's currently (easily) possible.)

We currently log what happens client-side with Matomo, which I believe is configured to be fairly GDPR-friendly (at least it removes last parts of IP addresses). I don't think contents of requests are not logged in Matomo, but the server logs contents in journald – it's possible to flush these every so often (journalctl --vacuum), but I don't know if we currently do.

unhammer commented 6 years ago

https://madmimi.com/p/4b4ffb?fe=1&pact=3326800-144812898-6864785919-c9c86a6dbb647db50ddd59d45c601b7971cd2c41 relevant tips for our Matomo installation, in particular https://matomo.org/blog/2018/04/how-to-make-matomo-gdpr-compliant-in-12-steps

Also, I believe we're legally required to add the following:

privacy-policy

Nutomic commented 5 years ago

Ah, those are CDNs. No, I'm not OK with removing any of those because they will cause large performance degradations on the site in the context of Apertium's servers slowness. Even if the server were fast, using CDNs is industry standard.

I dont think "industry standard" is a good argument to make, when the standard for internet companies is to actively spy on their users, and collect as much data as possible. Open source projects should avoid this behaviour as much as possible.

Server slowness is also not a very good argument imo, because it is very easy to add caching for static resources in nginx. Here is some documentation for caching, but I would be happy to help you with that.

ghost commented 5 years ago

I also think that it's a very bad idea to embed so many third-parties. You're right that the for-profit companies do that,too and that's a big problem but unfortunately we can't change that. However in a open source project,we should act as a positive example and take care about the users privacy. Many people don't want to use things from Google because they spy on their users and those users are searching for alternatives they can trust. This project is a good alternative to Google Translate so please make it also a good alternative for those who are searching for more privacy.