GEWIS / gewisweb

The website for GEmeenschap van Wiskunde en Informatica Studenten.
https://gewis.nl
GNU General Public License v3.0
16 stars 34 forks source link

Feature request: watermark uploaded PDF's for education archive #1049

Closed Yoronex closed 3 years ago

Yoronex commented 3 years ago

There has been an incident in which someone copied many documents from our education archive and published them for their own gain on websites like Studeersnel. To prevent this and to get leverage on taking these down, we need to "protect" our documents. The best and probably easiest way to do this, is to watermark every document in our archive with the GEWIS logo. At a glance anyone can see these documents are "owned" by us.

With this issue, I would like to request that all PDF's that are uploaded to the education archive get watermarked automatically. This seems to be possible within PHP with a library called FPDF. However, I am not knowledgable with PHP nor the GEWISweb repository. Existing documents can be watermark all at once manually.

Koen1999 commented 3 years ago

I suggest that watermarks are generated when a user downloads them, and contain the IP address from which it was done. Note that it is not an option to include the membership number since also non-members are allowed to download documents (unless this is something we would like to change?).

rinkp commented 3 years ago

If it is to detect who downloaded a certain file, it might also be wise to include the timestamp the file was downloaded. As far as I know the TU/e still keeps track of who had what IP address at what time. However, this does bring a problem for the new Strongswan VPN as that one does not give all users their own external IP address.

tomudding commented 3 years ago

I'd like to point out that adding a watermark does not necessarily mean that the documents won't be misused by those who download them. It will probably prevent most cases of misuse, however, people determined to abuse their access to these documents will still do so.

Normal text (don't bother with translations) would of course not suffice, so you would have to flatten the pages. However, this would mean you can easily edit the watermark out in a program like Photoshop. Restricting editing access within the PDF by means of a random password doesn't work either, just take a screenshot and repeat Photoshop step.

I am not saying we should not do it, we just have to accept that it will not prevent all misuse. There is plenty of examples -academic papers- which utilise this kind of watermarking and are still findable on the internet. The best solution (in my opinion) would be adding text at an angle across each page which says

Downloaded from gewis.nl by [IP] on [DATE]. Property of ....

and then flatten the pages. This will decrease readability and accessibility but provide us with the most robust solution.