Homebrew / brew

🍺 The missing package manager for macOS (or Linux)
https://brew.sh
BSD 2-Clause "Simplified" License
40.77k stars 9.57k forks source link

Require analytics opt-in rather than make it a blind default #142

Closed bcardarella closed 8 years ago

bcardarella commented 8 years ago

I'm not sure if this was discussed but there are going to be companies and government agencies that have a problem with data being sent out without explicit permission. I understand and appreciate the desire to collect the information but this introduces a problem for some people. Ideally this would have been an opt-in on upgrade and an opt-in on install.

dunn commented 8 years ago

It was discussed, we made it opt-in for about a month during testing then sent out notifications to the mailing list and twitter a week before switching to opt-out.

albus522 commented 8 years ago

I would say the vast majority of your users are neither on your mailing list or pay any attention to your Twitter account. So you have notified a very small percentage of your users that you will be collecting their info. If someone else hadn't pointed out this change, I would never have noticed, and I would wager that is the majority case. Add my vote for reverting this change.

kroofy commented 8 years ago

Regardless of how many users that have been made aware of this. The user should still have the option to actively make a decision during an install/update of brew.

This is too sneaky and opaque for the user.

davesque commented 8 years ago

It kinda blows my mind that this isn't opt-in.

erikj commented 8 years ago

Seems like it would be best if the interface queried the user and remembered their preference.

DavidCWGA commented 8 years ago

At the very least, prompt the user before first sending data to Google.

bfontaine commented 8 years ago

@davesque The problem with opt-in is that you don’t get representative data.

DavidCWGA commented 8 years ago

The advantage of opt-in is that you respect your users' choices and privacy.

Ianleeclark commented 8 years ago

@kroofy Incredibly opaque, how dare they link to the two files which directly implement sending off analytics. And sneaky! They've actively managed to keep this information to only two of the largest technical communities: reddit and hackernews--plus their thorough documentation of exactly how it works is not yet thorough enough.

Truly, what could we do to fix these roguish individuals and their wanton disregard for us: the users of a free product who have never contributed to the Homebrew project (with exceptions to dunn and bfontaine, every user above me has never made a contribution, including me). Y'all are ridiculous calm down and just set the envvar.

jeroenh commented 8 years ago

@bfontaine the problem with distributing information is that you can never take it back.

jasonroelofs commented 8 years ago

@GrappigPanda Insulting people for being concerned doesn't help anyone. Very few users of Homebrew are even going to see this, much less know what's happening or that they are now sending statistics to some random Google Analytics account about what they do on their machine.

There are numerous companies and environments employing tens to hundreds of thousands of people in which security and secrecy is utmost such that any service that sends statistics outside of the walls of the company or group is strictly forbidden. With this kind of change, many people will suddenly start breaking these restrictions with no knowledge of what or why or when.

There's nothing wrong with opt-out, but this should be a message that shows up on brew update that's very explicit, very clear, and with instructions on what Homebrew now wants to do and how to opt out. As it stands right now, this change doesn't respect the privacy and needs of users of this tool.

codingcampbell commented 8 years ago

When things like these come up, I wish projects would stop defaulting to Google Analytics in particular. If you need some anonymous usage info, fine, but does Google really need another vector into my life?

bfontaine commented 8 years ago

@codingcampbell If you have a good solution that doesn’t need more work on our side (i.e. we don’t want to manage another server), we’re all ears.

Ianleeclark commented 8 years ago

@jasonroelofs I'm having a genuinely difficult time trying to digest what you've written.

If someone is breaching restrictions on sharing information by sending in an anonymous UUID with no other distinct markings to the data besides an OS version, then going to google.com is genuinely perilous. Your average blog is tracking much more information than homebrew is going to be, so heaven forbid if you ever need to consult the internet to figure out why a MySQL migration tool isn't working, or if you need to figure out how to install bcrypt on OS X, or any other reasonable internet search.

It's such a disingenuous thing to say.

albus522 commented 8 years ago

@GrappigPanda Your package manager is not your web browser. Also you can set your web browser to hide a lot of information.

lacombar commented 8 years ago

At worst, you should ASK during installation whether or not the user want to collect analytics, with the default set to "NO". Then I would consider enabling it. No matter how transparent you are about this default opt-in, this is unacceptable.

eknkc commented 8 years ago

While I think it would be better to have a prompt, defaulting to "yes" (because noone would opt in otherwise) to enable this, why is this a huge deal?

I mean, if you have strict policy about this kind of tracking, you should just firewall google analytics (heck, a hosts line would do it) globally. Other software that you've been using might be tracking you just as well, without announcing / documenting anything. At least we know what homebrew does.

It confuses me because if this is unacceptable, it means you trust every single binary you've been running. Teach me how, please.

lacombar commented 8 years ago

@eknkc because when I setup a new machine, I don't want to have yet another environment variable to setup. We all think about it today, but in a year, I'm gonna have forgotten all about this, and analytics are gonna get enabled because of the default opt-in... behind my back.

This is just a really pervasive and sneaky way to behave.

thecosas commented 8 years ago

@eknkc Prompt on install/upgrade with a default to yes and instructions on how to opt out would probably be the best solution for all parties. Transparency, choice, and useful data for the volunteer developers which is representative.

albus522 commented 8 years ago

@eknkc I don't go around installing software I don't trust, especially software whose sole responsibility is installing other software. I also have software that has asked me for usage info and my answer is always the same. I trusted homebrew when I installed it. This action is a definite breach in trust.

If homebrew wanted every user to know what was changing it would have been part of the update and/or install process. That would have been the trustworthy way, and no one would be here having this conversation.

NickCraver commented 8 years ago

This is how you turn a trusted project into an untrusted project. The fact that Homebrew is being open about implementation is great, but it's not being that open. When I updated earlier today (for dotnet CLI) I had no idea this was enabled until some outbound firewall logs alerted me. Then I found the Hacker News thread.

If we're saying Homebrew's being honest about it then let's actually be honest: you have very likely informed well under 1% of your userbase of this change. Users installing fresh are not being informed or afforded the chance to opt-out during install. Many users, even comfortable running brew commands they find on websites don't know how to opt out; they don't know how to set an environmental variable. Users upgrading are experiencing broken trust. Something they previously reviewed (as in my case) has, without so much as a console message, starting exporting their information.

I understand your need for analytics, but opting out should be trivial and the collection itself should be advertised to the user so they can do so.

MikeMcQuaid commented 8 years ago

Hi all I'm the maintainer who added this. I would have responded sooner but I've been on a plane for the last 10 hours. I'll post more ASAP when I get my laptop onto some wifi; on my phone now.

DomT4 commented 8 years ago

Can we please chill on the overzealous emoji use to every single comment here. We appreciate you have strong opinions and we're happy to discuss this further, but adding thumbs down emojis to people like Mike who's comment isn't any further than "I'll post later when I'm online, please bear with me" is a bit counterproductive.

Let's keep things civil and as calm as possible. Please bear in mind that Homebrew has a Code of Conduct that applies to how everyone talks to everyone else, whether that's us to you, you to us or you to each other.

If you'll give me two minutes before thumbing this post down to death like I've wandered onto Reddit by accident I'll leave a more personal opinion. Just want to try and keep the tone cool here so it can be discussed. Thanks.

damieng commented 8 years ago

I don't speak for the brew team but metrics/telemetry are essential for making informed decisions about where to take your tools or product. Being off by default is useless - hardly anyone changes the default.

Something like an occasional (and at first use/upgrade) brew sends anonymous metrics to help improve this software. If you wish to opt-out type brew metrics off at the end of a brew console operation would seem to be a fair trade?

Not everyone can give back code/docs to a project but refusing to even allow telemetry be on by default seems like a very one-sided deal.

NickCraver commented 8 years ago

@damieng I honestly don't think the metrics themselves are a point of contention. I don't think I've seen anyone object to collecting metrics. The objections center around collecting metrics without consent. brew metrics off would be a great addition - that's much saner than expecting all users to know how to set an environmental variable.

I hope everyone is onboard with showing notices (both in the installer and when upgrading). If someone objects to notifying the user this is happening at all, then we should have a very different discussion.

gmcmillan commented 8 years ago

@DomT4 In what way is there overzealous emoji usage? People are using those emojies as a form of communication because it is an established feature in github and it is a quick way to show you are for or against a certain idea. It also prevents clutter with dozens/hundreds of people saying the exact same thing in written text. So I don't understand the hostility to the emojis in this case -- people are clearly communicating they don't agree with how this was implemented.

I have no problem with this feature as long as it's communicated to the user correctly and I don't believe it has in this case, which is why you guys are getting so much flack. Most people don't seem to have an issue with the collection, but with how it's communicated and enabled for users. I use homebrew a lot, but I never check homebrew twitter or the mailing list (and I highly doubt most users do) so I would have never known I was opted in to this.

Why wasn't this presented as an option at the command line for when the user next uses homebrew? That is the most obvious place I see where something like this should be presented to the user.

Homebrew collects anonymized statistics about your usage to better prioritize features and bugfixes. Staying opted-in would help us a ton in understanding your usage so we can make a better product for you, but we can understand if you want to opt-out.

Would you like to opt-in to the Homebrew analytics platform (powered by Google)? [yes/no]
NickCraver commented 8 years ago

Adding links to the original issue and PR as references/background: Issue: https://github.com/Homebrew/legacy-homebrew/issues/34101 PR: https://github.com/Homebrew/legacy-homebrew/pull/50462

MikeMcQuaid commented 8 years ago

Hi all, I've managed to get to some wifi and power so I can comment here, thanks for your patience; I would normally have responded to this thread ASAP but I was literally on an ✈️ with no internet connectivity available.

Firstly, I figured actions would speak louder than words so I've opened and merged https://github.com/Homebrew/install/issues/42 and https://github.com/Homebrew/brew/issues/143. Every Homebrew user will now be told on first install or first brew update that we have enabled analytics and be pointed to the documentation that explains why and how to opt out.

Secondly, I apologise for the way that this has been communicated poorly. At every stage of the process I've tried to ensure that anonymity can be entirely preserved while using analytics and been careful in the data we gather so that it's not identifiable or private. If you haven't already I strongly encourage you to read through https://github.com/Homebrew/brew/blob/master/share/doc/homebrew/Analytics.md and raise any specific concerns you have with the implementation so that we can close any unintentional anonymity gaps.

Thirdly, the reason I made this opt-out rather than opt-in is so we can gather a representative understanding of the way people use Homebrew. If this was opt-in we'd gather only a sampling of the type of people who opt-in.

Finally, I apologise again. I hear the discomfort here and I'm sorry that I was not able to move more rapidly on your concerns. Thanks for being part of the Homebrew community.

NickCraver commented 8 years ago

@mikemcquaid The message is appreciated, but that shouldn't be "the fix". Opting out shouldn't require all of the effort still involved here. For most users, the steps will be:

  1. Opening a browser
  2. Go to that link
  3. Read a 5KB file (to the very bottom)
  4. (likely) Learn how to set an environmental variable
  5. Create a .bash_profile
  6. Set the variable

...and hope the process worked (there's no confirmation).

The bar is still far, far too high. There should be something akin to brew metrics off as pitched above. Only adding a URL is the equivalent of saying "go figure it out" to most users. And let's be honest, most will give up before getting through those steps.

bcardarella commented 8 years ago

@mikemcquaid I appreciate the response but I don't think this solves the problem. My original concern is that software the automatically opts you into analytics can prevent companies and government agencies from allowing the use of homebrew.

MikeMcQuaid commented 8 years ago

@bcardarella I appreciate that but it's worth mentioning that the majority of software products now use analytics in some form. No analytics are now sent before the messages are communicated and you can opt-out before that. It would be very simple for those organisations to write an install script for Homebrew that ensured that analytics are disabled before installation takes place.

bcardarella commented 8 years ago

@mikemcquaid you're missing the point: certain organizations prevent the usage of any software that opts-in to analytics. Even if there is a work around to opt-out.

DavidCWGA commented 8 years ago

The fact that "everyone is doing it" shouldn't make it OK. Homebrew targets developers who should know better. The sheer amount of noise occurring here should tell you that users don't want this.

NickCraver commented 8 years ago

I appreciate that but it's worth mentioning that the majority of software products now use analytics in some form.

I'm going to have to ask for a citation here, because I don't believe this to be accurate. Especially of projects on GitHub, how many are reporting analytics? Analytics are still fairly rare for most open source projects. In my experience, they're still very rare (as a percentage) for paid products as well.

davebarkerxyz commented 8 years ago

@bcardarella It's not even just certain organisations. I choose to use a tracker blocking extension in my browser because I know that environment is hostile towards user privacy. I don't expect to have to do this with mainstream terminal-based software like Homebrew.

@mikemcquaid I carefully vet software before installing on my machines (both personal and business). I do my research.

To introduce analytics with very little communication and with no affirmative agreement from end users is incredibly hostile.

It's very telling that you feel that opt-in wouldn't be as useful because many users wouldn't opt-in or agree for usage to be tracked. This is a sign that analytics isn't something that many end users feel comfortable with.

MikeMcQuaid commented 8 years ago

To introduce analytics with very little communication and with no affirmative agreement from end users is incredibly hostile.

I've admitted my mistake and merged a PR that will communicate this to all users (https://github.com/Homebrew/install/pull/42).

MikeMcQuaid commented 8 years ago

The bar is still far, far too high. There should be something akin to brew metrics off as pitched above. Only adding a URL is the equivalent of saying "go figure it out" to most users. And let's be honest, most will give up before getting through those steps.

@NickCraver I've added a single command that you can run to the documentation in https://github.com/Homebrew/brew/pull/146 and made it so if you set the opt-out variable once then it will set the git config such that analytics will remain disabled. I'm not going to be able to do much more than this tonight as I've been up for >18 hours after taking a transcontinental flight.

davebarkerxyz commented 8 years ago

I've admitted my mistake and merged a PR that will communicate this to all users (Homebrew/install#42).

I see that, but my point (and I believe that of a number of others here) is that you're still not requiring affirmative agreement. It's still an opt-out (enabled by default, requiring the user to take action to disable rather than agree to enable it).

I think that many here, myself included, would rather see it turned off by default.

Debian seem to have got this right with their popcon prompt.

MikeMcQuaid commented 8 years ago

I've merged https://github.com/Homebrew/install/pull/42 https://github.com/Homebrew/brew/pull/143 https://github.com/Homebrew/brew/pull/146 which is all I'm going to be able to do tonight. I really need to go get something to eat and then 💤 (and :sob:) but will check this thread again tomorrow. Thanks to most people in this thread for keeping it civil and thanks for using Homebrew.

davebarkerxyz commented 8 years ago

I'm not going to be able to do much more than this tonight as I've been up for >18 hours after taking a transcontinental flight.

I appreciate that you're trying to triage this. Most people are just upset that it came out of no-where (for those of us who don't follow the repo or Twitter feeds closely, which most of us have agreed wasn't a reasonable communications channel for this).

Something as integral as a package manager can't, by its very nature, be easily sandboxed. And to become (and remain) the dominant package manager on a platform, Homebrew needs to have the complete trust of its users.

With the current climate of privacy-hostile applications, platforms, operating systems and governments, default-on telemetry gathering of any sort isn't really acceptable.

As a developer, I completely understand the drive to gather metrics. After all, they help us make our products better. They help us figure out more easily what users need. However I don't think it's unreasonable to be pushing a privacy-first ethos in our industry.

Metrics are valuable, but not at the expense of user trust. Debian Popcon seems to strike a good balance (default highlighted option on the install prompt is "off", IIRC).

achikin commented 8 years ago

@GrappigPanda hi there! I have contributed, I don't read twitter, hackernews and reddit. And I have something I can't share legally.

zmwangx commented 8 years ago

I find it kind of funny that there are still people arguing about opt-in and whether analytics should be done. Do you realize that when you clone and/or fetch brew.git and/or homebrew-core.git, GitHub collects analytics? And when you download bottles, Bintray collects analytics? You can argue that GitHub is the first-party (I doubt it), but Bintray is definitely a third party. These are analytics that are already in place. If analytics turns you away, you should be gone by now.

To answer a few specific questions:

Especially of projects on GitHub, how many are reporting analytics?

How many projects on GitHub require internet connectivity beyond initial install? And how do you avoid analytics when you are connecting to any kind of server?

certain organizations prevent the usage of any software that opts-in to analytics. Even if there is a work around to opt-out.

Why would your organization's ridiculous policy prevent Homebrew from being improved?

Just my two cents.

NickCraver commented 8 years ago

How many projects on GitHub require internet connectivity beyond initial install?

Most do not. Most have no reason to. Just go find any major project list and go down it. Here's the currently trending for example: https://github.com/trending

Why would your organization's ridiculous policy prevent Homebrew from being improved?

This is an absolutely terrible way to have a conversation. It's not a ridiculous policy and many high-security environments (I've been involved in quite a few) have such a policy. Calling it "ridiculous" creates a needlessly bad and adversarial environment that helps no one.

davebarkerxyz commented 8 years ago

Do you realize that when you clone and/or fetch brew.git and homebrew-core.git, GitHub collects analytics?

There's a difference here. I understand how the Git protocol works. I know exactly what data it transmits. I looked into this before I started using it. My Git client wasn't updated to include transmission of extra metrics without my knowledge. I had the information required to make in informed choice about using Git.

Homebrew, however, introduced metrics collection and transmission without affirmative consent. When I first started using it, this wasn't the case. I done my research, cleared it for use on my personal and work machines. This changed, and extra data was transmitted in a default-on manner, with no warning.

Essentially, Homebrew began behaving contrary to established conventions.

I'm very privacy conscious. But I do understand that many people don't care one way or another about privacy, analytics or usage tracking. The issue here is that the software broke the trust I had with it and took that choice away from me (by not advertising the changed behaviour and making the opt-out obvious).

It all comes down to communication, user choice, and pro-privacy, conservative defaults.

freels commented 8 years ago

FWIW, Apple asks users to opt-in to the submission of anonymous usage data. If Apple can sacrifice potential inaccuracy in usage data in order to respect user privacy, I don't see why homebrew cannot as well.

zmwangx commented 8 years ago

@NickCraver

How many projects on GitHub require internet connectivity beyond initial install?

Most do not. Most have no reason to.

Exactly. But Homebrew does. And with every project legitimately requiring internet connectivity, you can't avoid analytics. apt-get, pacman, PyPI, npm, etc. are all capable of collecting analytics info (I don't know if they use it or not) when you connect to their servers. (Not for all commands, I know, but granularity is not the point of debate here.) The reason Homebrew does this differently through Google Analytics is because it does not have its own central server for distribution.

@davb5

I understand how the Git protocol works.

And you probably also understand how HTTP and curl work. You read analytics.sh and analytics.rb, you see what is being sent by curl and what is not, done.

Homebrew, however, introduced metrics collection and transmission without affirmative consent.

If you read my comment carefully you'll realize that I didn't say consent is not needed. I'm all for the idea of an yes/no prompt upon initial install or update after shipping the analytics feature. What I found ridiculous is that there are people who want to overturn the decision of collecting analytics, or want to make it opt-in (which will render it completely useless, because most people wouldn't realize there's such an option). Read my first sentence again:

I find it kind of funny that there are still people arguing about opt-in and whether analytics should be done.

ntpd commented 8 years ago

What I found ridiculous is that there are people who want to overturn the decision of collecting analytics, or want to make it opt-in (which will render it completely useless, because most people wouldn't realize there's such an option).

debian

Debian's popularity-contest is opt-in (defaulting to "no"), yet still collects a large amount of useful data.

zmwangx commented 8 years ago

@ntpd Homebrew installation process is often (if not mostly) automated. If you make it opt-in, then installing Homebrew non-interactively — either through the install script without connecting stdin to a tty or through git clone — will totally skip the opt-in prompt.

davebarkerxyz commented 8 years ago

@NickCraver Let's keep it civil. I've read your comments thoroughly. I think we're arguing semantics here. A yes/no prompt on installation is opt-in. If you say that you're in favour of that, then you're in favour of opt-in. Opt-in doesn't mean a silent default-off and an expectation that the user will dig around to find out how to enable analytics. Opt-in could mean a prompt on install (or first interactive run for existing users), with "no" selected by default, as per Debian's Popcon.

I do agree with the reservations people have with Google Analytics. As valuable as it is, Google do have a unique position where they could very easily correlate analytics events across app/device boundaries (using IP addresses, for instance). This means users have to put a lot of trust in Google that they'll obey their own privacy settings.

@ntpd has it right. Popcon does the analytics thing in the most respectful (default-off) and informative way. For the most part, I don't object to popcon on personal machines (restrictions in varies enterprise environments are different). Popcon's hosted by Debian themselves, and the data is public. I get as much from Popcon as I put it, and it respects my decision to leave it off (by default).

zmwangx commented 8 years ago

@ntpd Also, your opt-in data is definitely skewed. Those just represent package popularity among popularity-contest users. "Large amount" doesn't imply accuracy.