TUfast-TUD / TUfast_TUD

Browser Extension for higher productivity with TU Dresden IT-Services 🚀
https://www.tu-fast.de
GNU General Public License v3.0
38 stars 13 forks source link

[NEW FEATURE]: TUfast goes Data #123

Open Noxdor opened 1 year ago

Noxdor commented 1 year ago

See TUfast-TUD/TUFastData#1 for more details.

After the mentioned API endpoints are created, they have to be used by the frontend.

OliEfr commented 1 year ago

Folgende Statistiken fände ich interessant

Nur meine Ideen. Gerne auch ändern! Folgende Flags nutze ich zum klassifizieren: RT: send to api in real time weekly: send only once per week to api; with possible database reset in between (if only required) ETI: easy to implement HTI: harder to implement

High prio Anzahl gesparter Klicks | RT + weekly, HTI Welche Features aktiviert sind (OWAFetch, Suchmaschinenweiterleitung, AutoLogin) | weekly, ETI Wie viele aktive User wir haben | weekly, ETI

Normal prio Welche Studiengänge ausgewählt sind | weekly, ETI Ob Opal-Kurse importiert sind | weekly, HTI Wie oft Popup geöffnet wird | RTI, ETI Welche Banner-Links gedrückt wurden | ETI, RT

Low prio Wie oft Settings-Seite geöffnet wird und ob überhaupt | RT, ETI Ob die User-Introduction durchgeklickt wurde | RT, ETI Wie oft die einzelnen Icons im popup gedrückt werden | RT, ETI Welche Shortcuts wie oft verwendet werden (Dafür gibts keinen localStorage bisher) | ET, RT

Relevante Objekte im LocalStorage

Seit @C0ntroller 's rework kenn ich die korrespondierenden Variablenname im local storage nicht mehr. Ich würde alles für die Variablennamen nach dem mv3 update (also aktuelle main/dev branch) auslegen! Vielleicht kann @C0ntroller helfen die Variablennamen für die Settingseinstellungen zu nennen (oder weiß sie aus dem Kopf)?

@Noxdor

OliEfr commented 1 year ago

Ich denke:

AutoLogin enabled: isEnabled Weiterleitung enabled: fwdEnabled Owa mails: enabledOWAFetch Clicks: savedClickCounter Pdfs: pdfInNewTab und pdfInInline Studiengang: studiengang

Das müsste aber nochmal experimentell am besten bestätigt werden.

C0ntroller commented 1 year ago

Ich denke:

AutoLogin enabled: isEnabled Weiterleitung enabled: fwdEnabled Owa mails: enabledOWAFetch Clicks: savedClickCounter Pdfs: pdfInNewTab und pdfInInline Studiengang: studiengang

Das müsste aber nochmal experimentell am besten bestätigt werden.

Sollte alles richtig sein, mein local storage hat gerade all das hier, sollte ja selbsterklärend sein:

additionalNotificationOnNewMail
enabledOWAFetch
fwdEnabled
isEnabled
pdfInInline
pdfInNewTab
savedClickCounter
studiengang

Auf die Datenschutzerklärung habe ich ja schonmal gar keine Lust ^^

OliEfr commented 1 year ago

@C0ntroller Die Datenschutzerklärung würde nur angepasst werden müssen, wenn nutzerbezogene Daten übertragen werden, was wir nicht tun werden. Die übertragenen Daten sind nicht zuordenbar und anonym. Über solche Änderungen müssen Nutzer übrigens auch nicht benachrichtigt werden.

Trotzdem wird es eine Meldung und Erklärung für die Nutzer geben, wie und warum wir das tun. Für Transparenz und Sicherheit.

C0ntroller commented 1 year ago

Also mindestens (!) der Studiengang ist personenbezogen. Und man kann definitiv streiten, ob meine persönlichen Einstellungen als personenbezogen gilt.

Noxdor commented 1 year ago

@C0ntroller Personenbezogene Daten sind Daten, die dich als Person eindeutig identifizieren könne. Weder der Studiengang, noch dein Einstellungssetup sind eineindeutig, somit lässt sich kein Nutzer durch diese Daten identifizieren. Die Einträge werden außerdem nicht einzeln gespeichert, sondern nur als Summe in der Datenbank, daraus lassen sich absolut keinerlei Schlüsse über einzelne Nutzer ziehen.

OliEfr commented 1 year ago

Here is my grain of opinion:

In general, the GDPR applies. I got my info from europa.eu, gdrp-info.eu and gdpr.eu.

I have two thoughts:

The first is: it is indeed possible to pinpoint an individual person by only knowing the saved klicks. Because how many people are there with this exact amount of saved klick...? Exactly. Probably only one. So this could be seen as a personal identifier - just as the name+surname is. (The same argument might hold for combining all the 6 setting flags, because a unique combination might exist. In contrast to that, the Studiengang might actually the most uncritical data because it doesnt pinpoint an individual but only a large group, IMO.) Everything here is based on the assumption that the saved klicks are unique to everyuser - do you think they are?

The second thought is: even though the data could theoretically identify, we still dont have a physical identifier, as a IP-Adress or CPU-Hardware-Information, or a Name, or a ethnicity would be. So we cannot really match an individual! Also, the data is non-persistent and can be faked easily. So it is not reliable for identification. The saved clicks, for instance, might change daily.

EDIT: after reading this again, I feel like this might fall under GDPR. Options to proceed: a) stop doing this feature (although its very nice). b) only collect selected information from TUfast users. This would need to be very carefully desinged though. c) create a opt-in for the user and update privacy policy.

EDIT2: After reading it the second time, I am pretty convined that we cannot do it with a privacy policy+opt-in. Could some one please make an argument against it..?

@julianHGER opinion?

OliEfr commented 1 year ago

I do not like option c too much. If only 20% of users opt-in, we don't have too many benefits. Maybe we could do the opt-in first, see how many users agree, and only then implement the rest?

OliEfr commented 1 year ago

Here is my analogy:

If you take all the data we collect and create a single string from it, would this string be unique (or would there only be a really small group of people with the sam string)? If the answer is yes, then it is personal data!

I dont know - is this analogy correct, or am I off?

Noxdor commented 1 year ago

I am going to make the argument, that neither the number of clicks nor the individual setup of which features are used, are going to enable us to identify a single individual. This is a simple result of how we will save this data in the database. The GDPR, as far as I understand, only applies to how the data is saved not how it is transferred to the saving solution (database), otherwise no anonymous collection would even be possible, because data is identifiable until it reaches the database (simply by how TCP and HTTP work, no connection without connectants).

So why will we not be able to identify a user? Simply because we will not save each request to the database as a new row/data point. We will only increment a counter with it. As soon as there are 2 data points mixed in that counter, it is impossible to identify a user. The sum can't be split back into its addends without knowing what they are (which we will simply not know by how the database and api is set up). So this information you are talking about, total number of clicks of a single user, boolean array of features activated of a single user, won't exist! By that not causing any issue with GDPR.

For example, assume the database contains the counter "total_saved_clicks" with the value 55. Now, is this 55 the sum of 50 and 5, 45 and 10, 13 and 42? We can never know, by that, never identify a user. Transferierung the delta value makes this even more impossible, because we don't know the start value of clicks of that user, making it impossible to trace back to the total number of clicks the user actually has. A same argument can be made (except for the delta value, since this won't be relevant) for the boolean array of activated features. User A uses features 1 and 3, user B uses features 2 and 4. Now we have in the Database: 1 of 2 users uses feature 1. It is impossible to know which user it is, since we don't save any identifier connecting a feature usage to a user.

That is, as long as we don't keep track of things like "most clicks by a single user" which will indeed identify a single user.

OliEfr commented 1 year ago

Yes, I agree with you. The argument you make holds. Intuitively I also don't view the data we intend to use as personal.

However, I am not sure about the first premise:

The GDPR, as far as I understand, only applies to how the data is saved not how it is transferred to the saving solution (database), otherwise no anonymous collection would even be possible, because data is identifiable until it reaches the database (simply by how TCP and HTTP work, no connection without connectants).

My intuition was the following: it doesn't matter how the data is saved. The only thing thing that matters is which data is transferred and if this data can theoretically be used to identify a user - independent of how you actually gonna save and use the data. Because in the end, the user has no control over the data once it has been transferred to a remote entity. And yes, you would be correct in that case: all http traffis would be personal.

I see that this intuition makes only limited sense. I couldnt resolve this issue with a quick google search atm.

OliEfr commented 1 year ago

We could ask the PDO at TU Dresden. https://tu-dresden.de/tu-dresden/organisation/gremien-und-beauftragte/beauftragte/datenschutzbeauftragter

C0ntroller commented 1 year ago

GDPR states that any personal data that is transferred to our backend and processed in any way must be disclosed.

If we just talk about the technicalities of connection, it should be safe, as the IP address is not processed by any program. But for example, if the web server logs accesses - we must disclose that the IP address is stored, even if we never use it.

Also, GDPR clearly states:

personenbezogene Daten“ [sind] alle Informationen, die sich auf eine identifizierte oder identifizierbare natürliche Person (im Folgenden „betroffene Person“) beziehen; als identifizierbar wird eine natürliche Person angesehen, die direkt oder indirekt, insbesondere mittels Zuordnung zu einer Kennung wie einem Namen, zu einer Kennnummer, zu Standortdaten, zu einer Online-Kennung oder zu einem oder mehreren besonderen Merkmalen, die Ausdruck der physischen, physiologischen, genetischen, psychischen, wirtschaftlichen, kulturellen oder sozialen Identität dieser natürlichen Person sind, identifiziert werden kann; …

(didn't find the equivalent English section, source is German Wikipedia)

The study course is a social identity. One could argue that just using TUfast and that usage getting logged is enough to prove that this person is a student of TU Dresden (rough location data and social identity student or employee at TUD). Currently, the stats we have are collected by Google, who you need to agree on, that they are allowed to track you.

GDPR also states that personal data is every data, that potentially could make one identifiable. And if that means, we know it's the same anonymous user we identified him (even if we don't know his name etc.). So, when tracking the saved clicks, for example, it could be there is one guy with >500k saved clicks. We would know every week who this guy is, as he's the only one with that score, and we know he's a power user → so we know some attributes of him.
As this potentially could happen, this raises the need for disclosure.

Privacy law and making something compliant with that law is hell. That's one of the reasons most Websites have their cookie-banners made by a third party. It's also why you always should have a Datenschutzerklärung even if you think it's not necessary (you never want to fight over this in court).

Unrelated, in my opinion, you should always give the user the choice anyway (Opt-in).

OliEfr commented 1 year ago

I agree - GDPR + opt-in is required.

Option A) Abort this feature Option B) Ask for a opt-in first, then see how many comply, then implement the tracking