golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.33k stars 17.7k forks source link

x/telemetry/cmd/gotelemetry: bring back 'local' vs 'off' options #63832

Closed hyangah closed 1 year ago

hyangah commented 1 year ago

In https://github.com/golang/go/issues/63143 we removed the original 'off' option and changed gotelemetry off means "turning off upload". The motivation was to simplify the command interface. Unfortunately, this leads to confusion - users expect to turn off statistics logging completely by doing so. Moreover, we learned that there are certain cases where some users want to avoid extra disk writes if they are not critical for gopls' core functionalities.

We propose to restore the original 'local' and 'off' distinction.

The default remains 'local'. The submission of telemetry data is still opt-in. The collection of telemetry data is opt-out like go toolchain's metadata. Data collected before gotelemetry on is not included in submission.

Related - gotelemetry clean shouldn't delete the mode file, but can delete all other files.

findleyr commented 1 year ago

Thanks for filing this, @hyangah.

To summarize: in #63143 we consolidated the API to off|on, with local collection always on, because we didn't think there was any reason to avoid recording telemetry data locally -- in most ways telemetry data is similar to other metadata we write to the file system. We've since heard from multiple users that the mere existence of telemetry data (even if it will never be uploaded) can set off policy alerts in some organizations. Furthermore, the fact that gotelemetry off doesn't turn off this local data collection can be confusing.

While in most circumstances we don't think there should be any concerns about local data collection, users shouldn't have to e.g. fight for an exemption in environments where, as a matter of policy, such data collection is prohibited.

Therefore, it seems simplest to have gotelemetry off disable everything, even local collection, and reintroduce the concept of a local mode, which can be selected with gotelemetry local. The default should be local, as we still believe that local collection should not be a problem in most environments.

doggedOwl commented 1 year ago

I'm very sad to see that still opt-out is being considered as an acceptable approach even for just local data collection, even without sending them. This still forces us to tightly control go installations and it is no small burden.

The main issue with opt-out local collection is that it still creates the potential for problems when/if someone then enables gotelemetry even if approved because now we are not sure that only data from the moment of approval is being sent on. potentially that will send telemetry data collected before the approval.

Opt-out is such a burden and even ethical issue that you are forcing on all European Companies. (I would say it's a ethical issue everywhere but it seems that at least in the US everyone is resigned and Google and others are taking this desperate resignation as a signal that is ok to collect their data without consent.)

hyangah commented 1 year ago

@doggedOwl I updated the proposal to clarify - data collected before gotelemetry on is never included in the submission.

The data on disk helps users make informed decision before considering to opting-in. Users who decide not to opt-in for submission can continue to use the data to inspect and debug toolchain performance/correctness issues they experience, like other metadata.

doggedOwl commented 1 year ago

That clarification goes a long way. Still an opt-in aproach would be preferable even with this guarantee. gotelementry off would be better as default. Then the flow becomes: We enable gotelemetry local to verify what data is being collected and then at a later stage gotelemetry on.
There is no downside to a full optin aproach.

rsc commented 1 year ago

@doggedOwl, I appreciate your point of view, but there are good reasons to have counters and crashes written to disk locally. Among them:

I will also reemphasize once again that even when 'gotelemetry local' is set and the user changes it to 'gotelemetry on', we never upload the data that was collected in 'local' mode. There is no harm to this data being on local disk, and at least two real benefits.

Also, the go command already writes small amounts of metadata to places like the Go build cache that it uses for things like cleaning up the cache, not to mention the build cache itself as well as the module cache and module cache indexes. The local statistics are simply additional files maintained while running the go command, local to the machine like all the other files.

We understand that not everyone will agree with the default being "write counters and crashes to local disk with no uploading". Some people want "write nothing" and some people think it should be "upload by default". There is nothing we can do that will please everyone. We believe a default of "gotelemetry local" strikes the best balance.

sbromberger commented 1 year ago

There is no harm to this data being on local disk

There is when policy dictates that things stored on the filesystem are subject to remanence and disclosure policies, and that as a result only data that needs to be stored should be. There is a real cost associated with this decision.

Also, the go command already writes small amounts of metadata to places like the Go build cache that it uses for things like cleaning up the cache, not to mention the build cache itself as well as the module cache and module cache indexes

This data is necessary for the operation of the tool. It is also stored in a directory that is designed to hold ephemeral data, so it can be removed without issue and excluded by name from backups. In contrast the telemetry data is not (currently!) necessary for the correct and intended operation of the tool; it serves a few purposes but statistics-gathering for the benefit of an outside organization may not be a valid one as far as some of the more strict systems security auditors are concerned.

That said, I think that local is a reasonable default. Those of us who don't want / can't have this telemetry data stored are in the distinct minority and as long as there is an option that allows us to use the tool with full functionality without storing the data at issue, an opt-in approach doesn't seem to me to be too unreasonable.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

Currently there are two gotelemetry settings:

The proposal is to rename “off” to “local”, keeping it the default, and then introduce a new “off” that means don’t write counters and crashes at all.

hyangah commented 1 year ago

I want to clarify one more detail. We will still need to store metadata in the telemetry directory for gotelemetry off - to record user's choice, etc. It's not a bad idea to include them in backup.

sbromberger commented 1 year ago

We will still need to store metadata in the telemetry directory for gotelemetry off - to record user's choice, etc. It's not a bad idea to include them in backup.

This is a good thing - it will allow folks to scan that file to confirm the setting is appropriate for the environment.

findleyr commented 1 year ago

@rsc @adonovan -- any update on this from the proposal committee yesterday?

gopherbot commented 1 year ago

Change https://go.dev/cl/541376 mentions this issue: all: revert to telemetry mode values (on|off|local)

adonovan commented 1 year ago

This proposal was accepted during the review meeting, and the decision is recorded in the minutes. I'm not sure why the automatic updates haven't gone out yet--perhaps Russ's travel plans have delayed things.

rsc commented 1 year ago

The minutes have now posted (and would have commented here except the state was already changed). Thanks!

gopherbot commented 1 year ago

Change https://go.dev/cl/542055 mentions this issue: internal/counter: don't write counters to disk if mode=off

gopherbot commented 1 year ago

Change https://go.dev/cl/542317 mentions this issue: gopls: upgrade x/telemetry and account for new mode logic

gopherbot commented 1 year ago

Change https://go.dev/cl/542156 mentions this issue: gopls: upgrade x/telemetry and account for new mode logic

findleyr commented 1 year ago

This behavior has now been released in gopls@v0.14.2, so this is done.