Wingysam / Christmas-Community

Christmas lists for families
GNU Affero General Public License v3.0
245 stars 41 forks source link

[RFC] Feedback: Implement Usage Metrics (Telemetry) #121

Open Wingysam opened 9 months ago

Wingysam commented 9 months ago

I propose that Christmas Community automatically report the following metrics to a server under my control:

Knowing this information would allow me to remove features that aren't really used by anyone. For example, I suspect that nobody uses Bulmaswatch. Removing it would allow me to switch to a newer version of Bulma. I'm not sure that everything needs to be duplicated to maintain the table/non-table wishlist formats. Knowing what features are actually unused would allow me to remove unnecessary complexity to make the project easier to maintain.

I would place all of the telemetry code in a single module that the server tries to import on startup, but wouldn't have a problem with failing to load. The opt-out would be rm built/telemetry.js.

Do any of the users of Christmas Community have privacy concerns with this?

jskiddie commented 9 months ago

Just today I thought you should gather statistics to figure out wether or not people actually use themes and then be able to know wether or not we can get rid of this 14MB dependency and maybe switch to a different system where you'd have to upload a theme or otherwise implement it so that only the theme you actually use resiedes on the system, since this change would definitly break the current THEME ENV. I would suggest an ENV (that determines wether or not telemetry.js is loaded) in addition to the ability to simply delete it because this way one wouldn't have to build their own container to disable telemetry. Furthermore just to be sure I'd like to hint at using a privay respecting software/service for gathering the data and not to gather any peronal/identifying data in the first place (e.g. IP-Addresses) but from the nature of your RFC I assume this won't be an Issue. I'm happy to help in any way possible. And btw nice of you to open this RFC in the first place thx :) All of the above is just my personal opinion and might not be the opinion of the majority of users.

Wingysam commented 9 months ago

Adding an env variable with a comment warning that opting out of telemetry means your use-case won't be considered when removing a mostly unused feature is being discussed is a good idea.

There is an advantage of not sending your IP address to my server, but there are a few tradeoffs with that:

  1. The proxy service would get your IP, and personally I trust myself more than a relay service.
  2. Having IPs means that I can block any trolls trying to flood my stats with useless data.
  3. Honestly it'd be really cool to see what percentage of users are in what countries. This doesn't really have much legitimate use but it'd be motivating to see it.

I'd write a privacy policy for the telemetry service binding me to never sell or otherwise receive value from handing the data to anyone. I think that + the opt-out + the actual data being collected being clearly stated makes the telemetry responsible/ethical.

If anyone disagrees, please let me know before January 1.

jskiddie commented 9 months ago

Thankyou for your consideration :) First of all I'd like to differentiate between sending and storing the IP-Address. As the IP Adress is by desing part of almost every request made to any server. Furthermore any of these Points below highly depend on the implementation of gathering and storing the telemetry data therefore the points below can merely be a hint at a reasonable compromise.

There is an advantage of not sending your IP address to my server, but there are a few tradeoffs with that:

1. The proxy service would get your IP, and personally I trust myself more than a relay service.

Yes yes shure thing a proxy wouldn't really increase privacy.

  1. Having IPs means that I can block any trolls trying to flood my stats with useless data. Yes again this is a valid problem, that might arise but I like to argue that you could just wait and see if this problem even ever arrives. If this does turn out to be a problem, maybe only store the IP temporarly and flush them afterwards this should be good enough for flood protection. If you have loads of time to spare firefox-general, principles, telemetry and maybe glean/general. As a sidenote depending on the implementation of the telemetry protocoll one might be able to simply spoof the IP.

  2. Honestly it'd be really cool to see what percentage of users are in what countries. This doesn't really have much legitimate use but it'd be motivating to see it. You could resolve IP to country locally or resolve it on server and just store the number of users which accessed from which country.

In General I'd suggest to store datapoints serpeate from each other and without timestamps e.g. only how many people had a given ENV set and increase it every time a new dataset comes in. Maybe let the clients send data at a set intervall or all on roughly the same time of day.

Wingysam commented 9 months ago

So I could avoid the problem of needing an individual trackable ID per install by picking a period of the day to send the stats during, then count all of the requests during that period? So I could use for example 3-4am UTC, then combine all of the stats as they come in?

My concern with resolving the country locally on the instance is that it'd have to make a request to something to get its IP, at which point I might as well do the country resolution there. I can however calculate the country when the request comes in then throw away the IP, although everyone will have to trust me that I'm doing as I say. I can at the very least open-source the telemetry server even if I can't make my infrastructure publicly auditable.

cj13579 commented 9 months ago

I don't mind this so long as I have the option to opt-out. Like @jskiddie has suggested it make sense to be configured via an environment variable since that's easy to document (you've got precedent for basically all other options in the README) and would work with both bare-metal and container installations.