dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.6k stars 1.03k forks source link

.NET core should not SPY on users by default #6145

Closed ghost closed 4 years ago

ghost commented 8 years ago

@blackdwarf @piotrMSFT I am very disappointed to discover that .NET core comes with a hidden and enabled spy utility that reports on its users. (Lakshanf/issue2066/telemetry dotnet/cli#2145). Apparently, MS has learned nothing from the backclash against Windows 10 spying on users. I suspect many will not want to install .NET core for this reason, which is a shame because .NET core is otherwise cool.

richlander commented 8 years ago

Our recent blog post discusses the addition of telemetry to the .NET Core tools. See: https://blogs.msdn.microsoft.com/dotnet/2016/05/16/announcing-net-core-rc2/#telemetry

Me and the folks on my team are motivated to provide a great product. As you can also see from the blog post, we've made some pretty dramatic changes in RC2. We believe that they are the right ones, but we need both feedback and usage data in order to help us find all of the rough edges. Usage data tends to be more objective in the aggregate and user feedback more insightful, so we do a better job when we have both available.

The data we collect does not identify individual users. We're only interested in aggregate data that we can use to identify trends. The telemetry feature is configurable, so you can turn it on/off at any time. It is also scoped, only applying to tools usage, not the rest of the product. We think that this is a good trade-off and recognize that not everyone will like it. We do know, however, that many people will like the product improvements that will come from this insight.

We intend to share the data. The presence of it will do a lot to define the scope of data. It will also give the community access to the same insight we have. We very much feel that improving .NET Core is a shared need and task. As an example, we would welcome a PR from the community that added another telemetry data point given a strong improvement reason and no loss in anonymity.

We are separately considering opt-in runtime telemetry to learn more about crashes, GC pauses and startup time. There is no way we can get enough insight about the product without that kind of information. We are very focussed on constant improvement and will transparently do what it takes to ensure the product is compelling and competitive.

As an aside, it's been a busy week with shipping RC2 and answering questions. I haven't actually looked at this data yet and I'm one of the primary consumers. I'll be doing that today or tomorrow. I'm looking forward to sharing my insights.

guardrex commented 8 years ago

@richlander

we need both feedback and usage data

Does the telemetry still include arguments provided to the dotnet command? In server hosting scenarios, some may have sensitive arguments passed to the command (for portable apps) that they wouldn't want leaked to MS.

and will transparently do what it takes

The program is not "transparent" IMO.

terrajobst commented 8 years ago

Does the telemetry still include arguments provided to the dotnet command?

My understanding is that this was recently discussed with our privacy team and we concluded that collecting the arguments themselves (hashed or not) is not acceptable per our privacy policies. Not sure whether the code already reflects that, but it's being worked on.

terrajobst commented 8 years ago

The program is not "transparent" IMO.

What would you accept as sufficiently transparent? Not trying to say that we already are sufficiently transparent; I'm trying to understand your concern and what we could do make it better. The product is worked on by various teams who all contribute to the same open source code base on GitHub. Clearly you think that's not sufficient, so I'd like to understand what process would address that.

guardrex commented 8 years ago

@terrajobst Ah! Thanks. I'm glad the arguments are safe on the server.

WRT transparency: There is no indication at the time of install that the dotnet cli is automatically opted-into data sharing. There's no checkbox that will set the opt-out env var. There's no note or link to the GH issue or a Docs page that describes the program and how to opt-out. The privacy policy merely links to the generic MS privacy policy, where there is no mention of the program.

You really have to have heard about this through the GH issues or via chat at JabbR or Slack ... or Wireshark your server I guess. In my mind, that hardly constitutes "transparency."

IMO there is a great risk here for negative PR if the mainstream media gets a hold of this issue that will not be good. It's only a matter of time before some enterprising journalist looking for a scoop picks up on this. The headlines here are not good: "Microsoft caught with sneaky program to spy on companies" ... I know ... I know ... barely accurate given what the data is, how it's shared, and it's use by the teams. You know that doesn't matter one bit when you're trying to sell a newspaper. I was a college newspaper editor. Trust me ... it will not be good if the current disclosures about this program hold to RTM.

richlander commented 8 years ago

@GuardRex This is good feedback. We do have a bit more to do to make sure that everything to with telemetry is obvious. We'll make sure that gets into the next release.

ghost commented 8 years ago

GuardRex is exactly right about the lack of transparency and danger you are in for a shitstorm, so it is a good idea to include a checkbox in the installer to make it visible!

Also, you should keep in mind the problem is both privacy AND security. As for security, I think that MS forget that a power user/developer may have hundreds of pieces of software installed. If all these pieces of software (in a stealthy way) report usage back to various servers on the internet, then the security attack surface becomes so large that it is impossible to secure the computer. Hence, many companies will ban your software (especially on Linux servers).

This particular "feature" may undo all the good things that MS is doing with .NET core. Even if I might personally be persuaded to risk my computer and privacy, some of my customers won't. Hence, I will be reluctant to base my development on .NET core because of the customer reaction to spying.

vcsjones commented 8 years ago

I would probably agree that people will be a little miffed by this. Homebrew for OS X recently went through this even though they were well intentioned, did it anonymously, and provided a way to opt out.

I think simply asking people on first use if they'd like to submit telemetry is a good start.

Consider what Yeoman does on first use:

screen shot 2016-05-19 at 10 26 05 am

I think people are generally happy to give feedback when asked.

mmc41 commented 8 years ago

@blackdwarf @piotrpMSFT @richlander In related news, VS2015 just got into big trouble because spy code was discovered: https://www.reddit.com/r/cpp/comments/4ibauu/visual_studio_adding_telemetry_function_calls_to/d30dmvu

You should consider learning from such mistakes!

vcsjones commented 8 years ago

Looks like issue dotnet/cli#3404 is tracking implementing notification of telemetry.

h3smith commented 8 years ago

@richlander - as someone looking to deploy projects built with this is healthcare and classified environments, this creates significant challenges. An environment variable is a decent starting point, but build time and local options should also be given to ensure that this data is not collected. I appreciate the desire of you guys, but it introduces security concerns.

akoeplinger commented 8 years ago

@GuardRex https://blogs.msdn.microsoft.com/dotnet/2016/06/27/announcing-net-core-1-0/#user-content-net-core-tools-telemetry shows the data points that are collected and the following statement which should make it pretty clear that telemetry only applies to the tools/CLI (i.e. dotnet):

The feature will not collect any personal data, such as usernames or emails. It will not scan your code and not extract any project-level data that can be considered sensitive, such as name, repo or author (if you set those in your project.json). We want to know how the tools are used, not what you are using the tools to build. If you find sensitive data being collected, that’s a bug. Please file an issue and it will be fixed.

guardrex commented 8 years ago

only applies to the tools/CLI (i.e. dotnet)

If you mean it only applies to executing a portable app using dotnet (dotnet .\myapp.dll) and not a self-contained app using corehost (myapp.exe) ... I don't think the language states that clearly. One has to know that you don't consider corehost to be a "tool," and that's not an assumption that I would make.

There is an on-going problem in assuming too much prior knowledge in communication with people (outside of the ASP.NET docs, where a major effort has been made to address this problem). I think writing docs with greater attention to explicit and comprehensive explanations, as annoying and time-consuming as that may be, clears up a great deal of confusion.

Setting this minor confusion aside, I greatly appreciate the effort that has been made to inform everyone about the telemetry program. I still wish that production servers weren't automatically opted-into the program, mostly because (just like @blackdwarf commented in a recent video interview I saw) I hate having to set and maintain env vars on servers ... a total PITA IMO.

dlebedynskyi commented 8 years ago

Guys, this issue is really important. A lot of projects ask for telemetry and it is ok. In fact for a bunch of those like yo, bower and so on dev like me willingly opt in.
But not asking using if he even want and referring to some elua that really no one will read smells. It is horrible negative PR.
Make option to opt in for use. Explain in details what you are going to collect and what not. Do not do it by default.
Otherwise we really will have to block feature or not use dotnet entirely. I really don't think that paranoid security team will even allow devs to deploy this now.

kspeakman commented 8 years ago

Discovering this telemetry has put the plans I had in using .NET Core back on the drawing board. You are essentially refusing to accept an arms-length relationship by including telemetry. Data leakage is a risk even if it isn't user specific. It also creates attack opportunities since attackers now have this plentiful and predictable avenue of communication to go after. Not to mention that once marketing gets wind (if they didn't help drive it in the first place), the data collection will be expanded. Save yourself work by creating more admin/security work for your users (to opt out or block telemetry). Just because it's an industry trend doesn't mean its a good thing to do. </3

guardrex commented 8 years ago

@kspeakman On the bright side, it is well controlled by the env var ...

https://github.com/dotnet/cli/blob/rel/1.0.0/src/dotnet/Telemetry.cs#L39-L44

... so at least if you add that via web.config, PowerShell, manually, or whatever ... it disables telemetry effectively. However, if you were more generally concerned about Microsoft.ApplicationInsights being on the server at all, then they have said that corehost doesn't have telemetry built-in, so you could go the self-contained app direction (no shared framework on the server) and avoid this entire issue. The only catch is that you need to pull the ASP.NET Core Module out and install that manually ... they don't have a standalone installer for the module yet (AFAIK), nor has it been spun off into OSS yet (but they are planning to do that).

RomanShumikhin commented 8 years ago

Is there any chance that this telemetry "feature" will be removed from the next version of the tools? If not, I totally agree with the original poster, this should be opt-in, not opt-out.

mschlechter commented 8 years ago

At the very least, the dotnet program should ask on first run whether the user wants this or not.

First Windows 10 and now this.

I don't want telemetry. At all. It's fine when people are beta testing a product in a special testing environment, but not in production.

linkdata commented 7 years ago

Making this opt-out instead of opt-in seems like really poor judgement. I understand and respect the need for you to collect some usage to help guide the .NET Core platform, but printing a few lines of text once before starting to send unspecified data over the 'net to some server is just disrespectful.

Please make it opt-in or remove it entirely.

ghost commented 7 years ago

I vote to remove it fully from .Net Core source code. It must be an external option, user should have ability to download some package to start statistics collection.

ghost commented 7 years ago

Since there hasn't been any post on this topic in a couple of months, I will share some insights having just come across this as a fresh (potential) adopter of Core CLR.

Just downloaded latest build of .NET core and just by luck noticed the unremarkable disclaimer after running one of the dotnet shell commands.

This is altogether ridiculous and comes on the heels of already-rediculous telemetry collection in their other products. I feel like Microsoft is saying publicly that they're not tone def to the community then they keep doing things like this.

I was starting to get excited about the implications of Core CLR and what that could mean for the expansion of C# (the language itself is really fantastic).

This automatic telemetry nonsense is a big reason why I shy away from the Windows platform entirely. It's not even about my own personal feelings or beliefs on privacy concerns and whatnot. It's about selling this platform to my company and my contracts. In an enterprise environment, getting people to trust Microsoft is already an uphill battle with many with my fellow developers and higher-ups. Making the case for using C# + Core CLR on Linux is MUCH easier than making the case for switching entirely to Windows.

However, this telemetry nonsense is simply a nonstarter. Imagine trying to sell this to someone already averse to monoliths and vendor lock-in (synonymous in our field with MS, for better or worse) then immediately having to defend telemetry collections (and the disablement thereof). We run production workloads in production data centers with enough infosec headaches already. Things like this are simply nonstarters for many executives. Sure, we can add an environment flag, but when has someone EVER forgotten to do that?

Alas, I am beginning to feel that Core CLR will go the way of Windows 10: admittedly great technology crippled by corporate nonsense that makes many developers just go look for some alternative when choosing a tech stack that doesn't come laden with such nonsense.

TL;DR; turn this crap off. You're pissing off the people you claim to be building tools for.

CodesInChaos commented 7 years ago

How about checking HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows\DataCollection\AllowTelemetry and disabling telemetry when it exists and is 0 in addition to your product specific opt-out? Users who don't want Windows telemetry almost certainly don't want .NET telemetry either.

ghost commented 7 years ago

@CodesInChaos, .Net Core can run on Linux... And I personally think some flag wherever is bad idea... as well as any assumptions... User should be able to control all in obvious way.

vcsjones commented 7 years ago

@hardhub

in addition to your product specific opt-out

CodesInChaos is suggesting that if the platform is Windows and they opted out of Windows telemetry, then just assume "no" telemetry. Otherwise, allow the user to make a selection.

ghost commented 7 years ago

@vcsjones

I see... And I mentioned what Linux guys should do? And no Windows Enterprise 10 users? I think some registry key somewhere is not good idea for user privacy... not enough obvious. It should be available very easy and disabled by default... I personally suggested to enable telemetry as package. Google, for example, does not force us to use GA....

OpinionatedGeek commented 7 years ago

In the interests of transparency, please let us all know which hostnames/servers to which the data is sent.

This will also allow people to block this traffic once, at the network level, instead of having to update every RC file for every shell for every user for every machine.

vcsjones commented 7 years ago

@OpinionatedGeek

In the interests of transparency, please let us all know which hostnames/servers to which the data is sent.

Telemetry is collected using Application Insights, to my knowledge. The documentation for their endpoints and IPs is here: https://docs.microsoft.com/en-us/azure/application-insights/app-insights-ip-addresses

OpinionatedGeek commented 7 years ago

@vcsjones Many thanks for that link and those hostnames. It's a very handy reference!

Can anyone from Microsoft confirm or deny that this is the full, correct list? I note that blocking all the listed hostnames would mean blocking access to hosts like login.windows.net and packages.nuget.org - hosts Microsoft probably doesn't want blocked.

Many thanks.

vcsjones commented 7 years ago

The ones specifically for telemetry are dc.services.visualstudio.com and dc.applicationinsights.microsoft.com. The rest are for ApplicationInsights, but aren't categorized as Telemetry.

Keep in mind this would affect any application that uses Application Insights, not just the SDK.

OpinionatedGeek commented 7 years ago

I'd still like to hear from Microsoft the official list of servers to which the telemetry is sent. Can someone from Microsoft please reply with these details?

The telemetry hostnames aren't in the dotnet CLI code base as far as I can tell, and I don't want to just assume that the hosts are the same as the 'Azure Application Insights' hosts, or that they remain the same across all OSs.

slawo commented 7 years ago

I also agree, telemetry should be an option activated only after the user's explicit authorisation or through: export DOTNET_CLI_TELEMETRY_OPTIN=1.

chrisjsmith commented 6 years ago

So it's been months. Nothing from MSFT. Clearly avoiding the discussion...

ghost commented 6 years ago

@chrisjsmith

Each company pursues its selfish interests ...

Hey MS! Tons of bugs is not what open source world means... It means to be open and ready to collaborate with people.

vcsjones commented 6 years ago

I thought to better frame this discussion it might be worth while to explore what telemetry is collected. I did my best to capture this, and have probably gotten some of this wrong.

  1. The name of the command that is run, like dotnet *foo*. https://github.com/dotnet/cli/blob/58c580dbcc212846d38820d015e73f2d7e80214e/src/dotnet/Program.cs#L148. It does not appear to collect the arguments passed to the command.

  2. When installation is complete successfully. It appears to also collect the name of the exe files that did the installation. https://github.com/dotnet/cli/blob/5a37290f24aba5d35f3f958300aa20329e5ccaa7/src/dotnet/commands/dotnet-internal-reportinstallsuccess/InternalReportinstallsuccessCommand.cs#L33

  3. Any telemetry logged by MSbuild. This is done by creating a logger and handling only telemetry events from MSBuild. This is a rather complicated subject in MSBuild and wasn't very easy for me to follow, so if there are concerns of interest in MSBuild telemetry, perhaps the issue is best raised there.

  4. Detailed information about dotnet new. However, none of this appears to contain personal information. It is mostly the template and language that is used in dotnet new, or if you used the help parameter. https://github.com/dotnet/templating/blob/rel/vs2017/3-Preview3/src/Microsoft.TemplateEngine.Cli/New3Command.cs#L916

ghost commented 6 years ago

@vcsjones

Thanks for information... But it does not matter... which exactly info is collected. It matters only for user which accepts it! We just said that anybody should have a CHOICE to enable it or never enable it. Let's say it is something like GA added to your app.. You know that Google can collect some big data... But you accept this... it is your choice.. you ADDED it yourself because it seems OK for you to share this info with Google to get good stats. But with this telemetry we have to find a way to avoid an imposed "features". It is abnormal even if MS wants to do its best.

guardrex commented 6 years ago

Folks ... good peeps ... plz ... just set the env var ...

DOTNET_CLI_TELEMETRY_OPTOUT

... and take a look at ...

https://github.com/dotnet/cli/blob/5a37290f24aba5d35f3f958300aa20329e5ccaa7/src/dotnet/Telemetry.cs#L42

You're good after that. The problem early on was simply that it wasn't clear if/when/what was being collected and exactly how to get out of telemetry. Now, they make it super clear that if you just set the var you're NOT tracked. It's also spelled out now what is being collected: https://docs.microsoft.com/dotnet/core/tools/telemetry

chrisjsmith commented 6 years ago

That's misspelled. It should be OPTIN.

Rationale for this being a bad idea:

  1. This defines the set of data currently collected. If this changes, it may collect more data. That means to ensure that it doesn't leak data, you have to audit every release to ensure telemetry hasn't changed.
  2. There's no test case.
  3. It, as most MSFT products these days, shows absolutely no respect for the end user's preferences.
  4. EU Article 29 Working Party are still miffed about the opt-out Windows 10 telemetry. Adding it to your development tool chain is just throwing gas on the fire.

There's enough case to change this to opt in. This is met with silence which is unacceptable. It's like the MS Connect issue I opened against IE9 after they broke ClickOnce when the new download bar turned up. Suck it up and accept it and when you provide evidence, silence.

Go doesn't call home.

Python doesn't call home.

Vim doesn't call home.

Hang on I think I just solved the problem...

guardrex commented 6 years ago

It's not misspelled. One must opt-out by setting a value of true or 1 ("yes" used to work, too) ...

https://github.com/dotnet/cli/blob/5a37290f24aba5d35f3f958300aa20329e5ccaa7/src/dotnet/Telemetry.cs#L26

[edit] I think what ur referring to was an earlier post where the comment was that it should be opt-in.

chrisjsmith commented 6 years ago

So much for being facetious.

"yes used to work too" is an indicator of how broken it is. Does it work now? Where's that test case?

guardrex commented 6 years ago

Thx @benaadams ... we need to update the topic. I'll take care of it.

benaadams commented 6 years ago

The telemetry data is being shared now; and you can also download it:

What we’ve learned from .NET Core SDK Telemetry

We are releasing .NET Core SDK usage data that has been collected by the .NET Core CLI. We have been using this data to determine the most common CLI scenarios, the distribution of operating systems and to answer other questions we’ve had, as described below.

As an open source application platform that collects usage data via an SDK, it is important that all developers that work on the project have access to usage data in order to fully participate in and understand design choices and propose product changes. This is now the case with .NET Core.

.NET Core telemetry was first announced in the .NET Core 1.0 RC2 and .NET Core 1.0 RTW blog announcements. It is also documented in .NET Core telemetry docs.

We will release new data on a quarterly schedule going forward. The data is licensed with the Open Data Commons Attribution License.

...

And much more info at the blog post https://blogs.msdn.microsoft.com/dotnet/2017/07/21/what-weve-learned-from-net-core-sdk-telemetry/

chrisjsmith commented 6 years ago

I think MSFT is missing the point here. Opt out data collection is not welcome regardless of the outcome or data being collected. It may even be illegal in some jurisdictions. EU are about to explain this to you by the looks. Choice is what matters and respect for the end users which is not being displayed here.

This is indicated by the fact that every time telemetry is mentioned with any product that MSFT ships a very slightly different question is being answered. For example this thread clearly states that people want it removed and we're being responded to with "here's what we collected, it's not so bad". Same with windows 10 telemetry where the silence is deafening. Is it the party line inside MSFT to not answer the same questions and drown them out with marketing and blog posts?

On top of this it's quite difficult getting this past security controls in some corporates. If your product calls home by default you're off then tender list immediately. There is not even any discussion on the matter.

Transparency is not showing us what you collect, it is explicitly asking to collect it. You're running the "hey we're collecting data, jump through these hoops in every workstation, deployment environment and target to turn it off" model which isn't sometime most people will want. It also sets a new precedent which I at least am not entertaining.

OpinionatedGeek commented 6 years ago

FWIW @chrisjsmith I agree with you.

Given the utter lack of response to questions in this Issue, it might be worthwhile putting thoughtful remarks like these in a comment to the blog post. I commented and at least I got a response there.

guardrex commented 6 years ago

@OpinionatedGeek @chrisjsmith I'm working on the topic updates as you can see on https://github.com/dotnet/docs/pull/2706. Unfortunately, the review site, which would show a built version of the topic after the CI runs, doesn't work publicly. However, you can see the proposed updates if you look at the diff that seek to address the transparency concerns. I'm working through these updates with the following in mind: who, what, when, where, how, and why. If I haven't hit the sweet spot, I think I'm getting pretty darn close.

I think the 1st draft is just about done now. I'm going let that build this afternoon (takes 4-5 hours) and then take a final look. If you want to wait a little longer for my final checks, then just wait until I pull the WIP off of it, which I plan to do by the end of the day (Saturday, 7/22).

I invite you and everyone to take a look and provide feedback. Of course, the docs issue and PR only pertain to the coverage of the feature ... nothing about the presence of the feature itself should be discussed over there. This issue is the best place (or the blog post comments as you say) to discuss the presence of the feature or how it works.

To discuss language used in the PR, use the PR comments or attach comments directly to the lines of the diff. To discuss the coverage of the telemetry feature generally, please use the attached issue at: https://github.com/dotnet/docs/issues/2705.

Keep in mind that the updates are only a 1st draft subject to heavy revision and approval. What comes out of this process may even go as far as only making superficial changes to the current topic or a complete rejection of the PR itself. Just keep that in mind.

ghost commented 6 years ago

Given the utter lack of response to questions in this Issue, it might be worthwhile putting thoughtful remarks like these in a comment to the blog post. I commented and at least I got a response there.

Maybe we have to put url to this topic there? ))

h3smith commented 6 years ago

While I agree with the sentiment here, if you look at Windows 10 logging, even when turned off by group policy it is sending data back to Microsoft. They clearly want to be gathering data in troves at a corporate level. I don't seeing this "feature" going away.

OpinionatedGeek commented 6 years ago

Maybe we have to put url to this topic there? ))

I already did @hardhub - here's the paragraph where I mention it:

Despite someone’s Github issue – https://github.com/dotnet/cli/issues/3093 – (over a year old and still running), despite someone else’s Pull Request – https://github.com/dotnet/cli/pull/7096 – switching telemetry off by default, we are in the situation where Microsoft now seems intent on making the tool’s spying even worse, all while talking about community engagement.

By all means add it again, if you believe it'll help.

rafaelrpinto commented 6 years ago

After more than 1 year, I guess this discussion is running in circles when there are only two obvious decisions to be made:

1 - Yes, we agree with the concerns raised here and will change the Telemetry to be OFF by default as proposed on dotnet/cli#7098. 2 - No, we want the data and will not ask permission before enabling telemetry. This software is free and if you want to use it read the instructions and disable the telemetry yourself. Or don't use it at all.

Any other discussion on sharing what's collected, better explanations and showing the benefits are just PR damage control for deciding option 2, which we all know some people won't like.

Just give us a plain NO and we won't bother raising concerns like this again.

shravan2x commented 6 years ago

Just throwing this in here: A trending HN article from a few hours ago https://news.ycombinator.com/item?id=14836737 .

chrisjsmith commented 6 years ago

I was just about to add that :)