golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.92k stars 17.65k forks source link

proposal: cmd/go: consider making GOTELEMETRY=off settable #68960

Closed findleyr closed 2 months ago

findleyr commented 2 months ago

As discussed in #68928, while go env GOTELEMETRY reports the current go telemetry mode value, GOTELEMETRY=off has no effect as GOTELEMETRY is not a settable environment variable. Apart from improving the documentation, we should consider supporting GOTELEMETRY=off as a settable value.

We can never support GOTELEMETRY=on being settable, because we want users to explicitly run go telemetry on to enable telemetry, and because the telemetry mode is a global property that controls whether telemetry data is uploaded. By default, telemetry data is stored in the local file system, and we don't distinguish in that data whether it was recorded while the telemetry mode was local or on. When the telemetry mode is set to on, we don't want to upload data that was recorded while the telemetry mode was local.

However, apart from the asymmetry I don't think there's a reason we couldn't support GOTELEMETRY=off.

It might be convenient to support disabling telemetry with GOTELEMETRY=off. We should consider it.

EDIT (2024-08-28): updated to explain a bit more why GOTELEMETRY=on can't be settable with the current data model.

gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

chrisnovakovic commented 2 months ago

I'm happy to work on this, since it would benefit the Please build system, as discussed in #68946.

Ideally, where would you want it implemented? I can see at least two possibilities in internal/telemetry/dir.go, one near the end of init and one near the start of Mode.

hyangah commented 2 months ago

If the bug #68946 is fixed, what is the remaining use case & motivation of GOTELEMETRY=off?

chrisnovakovic commented 2 months ago

If the bug #68946 is fixed, what is the remaining use case & motivation of GOTELEMETRY=off?

Collecting telemetry data has no use if the intention is for it never to be uploaded. Most of the build rules in Please's go-rules plugin invoke go in a network-isolated ephemeral sandbox, so the data only exists for the life of the build action and couldn't be uploaded anyway. It would be nice to avoid hitting the unnecessary telemetry-collecting code paths altogether in that scenario.

peterebden commented 2 months ago

If the bug #68946 is fixed, what is the remaining use case & motivation of GOTELEMETRY=off?

If that bug is fixed by using -modcacherw (or similar), the created files will be deletable, but there are still a number of unnecessary downloads of the Go telemetry module that will happen along the way.

I'd say that GOTELEMETRY=off is the most elegant solution to it, since it avoids the download & implicitly never creates read-only directories either.

findleyr commented 2 months ago

@peterebden I think if go telemetry env is local (the default), we can also avoid the download.

findleyr commented 2 months ago

I think the decision in this issue comes down to the following question: is the convenience of being able to disable telemetry with an environment variable worth the potential confusion caused by (1) there being two ways to disable telemetry, and (2) the asymmetry of GOTELEMETRY=on having no effect. The implementation is straightforward.

@peterebden or @chrisnovakovic, could you elaborate on why setting GOTELEMETRY=off is preferable to running go telemetry off in your environment?

chrisnovakovic commented 2 months ago

@peterebden or @chrisnovakovic, could you elaborate on why setting GOTELEMETRY=off is preferable to running go telemetry off in your environment?

Mostly because it's easier for us to control the ephemeral build sandbox via its environment than via the commands that run inside it. We'd have to make sure that any build action that runs go would first run go telemetry off, which is hard (perhaps impossible) to guarantee in a build system that allows its users to define arbitrary build targets that run arbitrary commands. We can't even necessarily guarantee that go is present in the system path (the go binary may be provided by a go_toolchain rule in Please's go-rules plugin, for example). On the other hand, we could guarantee that GOTELEMETRY=off in the sandbox, and, provided that go respects its value, it would always end up doing the right thing.

stapelberg commented 2 months ago

My use-case (running an integration test on GitHub Actions) similarly is much easier to configure via a GOTELEMETRY=off environment variable (which is inherited by processes) than to arrange for commands to be run in the various sub-processes that are spawned as part of the test.

ianlancetaylor commented 2 months ago

Just a note that if we permit GOTELEMETRY=off, we should report it in go env -changed.

hyangah commented 2 months ago

If the bug #68946 is fixed, what is the remaining use case & motivation of GOTELEMETRY=off?

If that bug is fixed by using -modcacherw (or similar), the created files will be deletable, but there are still a number of unnecessary downloads of the Go telemetry module that will happen along the way.

The fix we hope to implement is "not to download" at all unless the telemetry is explicitly on. So, no more unnecessary downloads of the telemetry module.

My use-case (running an integration test on GitHub Actions) similarly is much easier to configure via a GOTELEMETRY=off environment variable (which is inherited by processes) than to arrange for commands to be run in the various sub-processes that are spawned as part of the test.

go telemetry off is sticky. Run it once, then all the subprocesses spawned as part of the test and all future invocation of go in the system will no longer record telemetry.

And a note:

Making GOTELEMETRY=off settable will offer two ways to permanently disable telemetry. go env -w GOTELEMETRY=off and go telemetry off.

willfaught commented 2 months ago

(@ianlancetaylor suggested I repost here. Happy to move this to a new issue if needed.)

go env doesn't seem to work well with GOTELEMETRY:

Despite showing up an an env var, it's not treated consistently as one:

❯ go telemetry off
# nothing printed
# as expected

❯ go env | grep GOTELEMETRY=
GOTELEMETRY='off'
# as expected

❯ go env GOTELEMETRY
off
# as expected

❯ go env -changed
# nothing printed
# as expected

❯ go env -u GOTELEMETRY
go: unknown go command variable GOTELEMETRY
# expected nothing printed, success

❯ go env -w GOTELEMETRY=on
go: unknown go command variable GOTELEMETRY
# expected nothing printed, success

❯ go telemetry on
# snip
# as expected

❯ go env | grep GOTELEMETRY=
GOTELEMETRY='on'
# as expected

❯ go env GOTELEMETRY
on
# as expected

❯ go env -changed
# nothing printed
# expected to be printed: GOTELEMETRY='on'

❯ go env -u GOTELEMETRY
go: unknown go command variable GOTELEMETRY
# expected nothing printed, success

❯ go env -w GOTELEMETRY=off
go: unknown go command variable GOTELEMETRY
# expected nothing printed, success

It seems incorrect for go env GOTELEMETRY to work, yet go env -u GOTELEMETRY prints unknown go command variable GOTELEMETRY? Shouldn't it insead print something like go command variable GOTELEMETRY cannot be unset with go env?

It seems to me that go telemetry should be dropped, and GOTELEMETRY should be a normal env var. go env -w GOTELEMETRY=on is just as clear and explicit as go telemetry on. The go telemetry on output could be added to the go help telemetry output. It looks like the original motivation for making it a command was to include env and clean subcommands, but those were omitted, and users are told to use gotelemetry for that instead, so there's no point in keeping the go telemetry subcommand around.

peterebden commented 2 months ago

go telemetry off is sticky. Run it once, then all the subprocesses spawned as part of the test and all future invocation of go in the system will no longer record telemetry.

It's only sticky within one current 'user session'. We have ephemeral working directories for each invocation, go telemetry off writes a file into $HOME which will get cleaned up as soon as it's done.

The same is presumably likely to be true for any other usage of remote build execution (REAPI is a fairly widely known example), where presumably any sensible executor implementation would at least attempt to avoid allowing actions to write global state.

Making GOTELEMETRY=off settable will offer two ways to permanently disable telemetry. go env -w GOTELEMETRY=off and go telemetry off.

Well not quite, setting GOTELEMETRY=off in the environment of a process would disable telemetry for that process, future invocations of go would be unaffected. Running go telemetry off writes a file which permanently disables it (or as permanently as it can, anyway).

We've mentioned a bit before, but I also think it's relevant how easy it is to make this happen:

cmd.Env = append(os.Environ(), "GOTELEMETRY=off")

How do I do the same for go telemetry off? As Chris noted above, the command we're running might not actually be the go executable (in fact, it never is - it's always a shell invocation that might invoke go internally). Even if it is, and hence I know how to invoke it, I have to run that as a separate subprocess, and I think critically here: it only works on go 1.23. If go happens to be 1.22, then it will fail with "unknown command"; now my tool is stuck trying to work out what's gone wrong and whether that is "expected" or not.

One might also consider it bad form for certain tools to muck with a user's global config state, if they only want to make sure that their invocations of go don't have telemetry enabled.

findleyr commented 2 months ago

@willfaught I don't think we can eliminate go telemetry on -- my understanding is that we need the current language to be printed immediately when the user enables telemetry. Additionally, I think it is likely that we will add additional subcommands to the go telemetry command in the future. However, I agree that the asymmetry is awkward.

zigo101 commented 2 months ago

@ianlancetaylor

Just a note that if we permit GOTELEMETRY=off, we should report it in go env -changed.

I think GOTELEMETRY=off/on should be always reported in go env -changed, regardless of whether or not we permit GOTELEMETRY=off, as long as go env reports it and GOTELEMETRY is not default,

willfaught commented 2 months ago

I don't think we can eliminate go telemetry on -- my understanding is that we need the current language to be printed immediately when the user enables telemetry.

@findleyr It could also be printed when GOTELEMETRY=on is set with go env -w.

IMO, it doesn't need to be printed. People aren't going to accidentally set GOTELEMETRY=on (or at least it's no more likely than accidentally setting GODEBUG=execerrdot=0), and go help telemetry is there to explain what it does. If it's unclear from the CLI help what GOTELEMETRY does, then that suggests to me that the CLI help should be improved instead.

I understand wanting to be careful after the opt-out telemetry backlash, but this seems to go too far in the other direction, to the point of clunkiness.

Additionally, I think it is likely that we will add additional subcommands to the go telemetry command in the future.

I think we should have crossed that bridge only when we came to it, not before. We might not ever need to. Even if we do, that doesn't preclude treating GOTELEMETRY as a normal env var. go telemetry {clean,dump,blah} can still exist without go telemetry {off,on} existing.

findleyr commented 2 months ago

It could also be printed when GOTELEMETRY=on is set with go env -w. I understand wanting to be careful after the opt-out telemetry backlash, but this seems to go too far in the other direction, to the point of clunkiness

CC @rsc @matloob

Those are fair points, but #67111 was accepted and implemented and shipped, and it is very unlikely that we'd remove a subcommand, especially for superficial reasons. If the handling of the go telemetry mode could be more elegant, that's perhaps unfortunate, but there are other areas of the language and toolchain that could be more elegant.

For now, I think we should focus on making telemetry as unobtrusive as possible, and this issue is about investigating whether supporting an environment variable would be helpful. I suspect that the major problem caused by the current telemetry default is #68946, and perhaps if we fix this then there is no need for supporting GOTELEMETRY=off. We should fix #68946 for go1.23.1, and then see if there is still a need for additional changes.

findleyr commented 2 months ago

@hyangah points out a technical consideration that may make GOTELEMETRY=on impossible: currently, when the user runs go telemetry on, we record the timestamp of when that event occurred, and ensure that we never upload local data from a counter file that overlaps with a period before the go telemetry on event occurred. Being able to set GOTELEMETRY=on in a process environment would make this type of guard impossible, and I don't think we want to give up that guarantee.

willfaught commented 2 months ago

but https://github.com/golang/go/issues/67111 was accepted and implemented and shipped, and it is very unlikely that we'd remove a subcommand, especially for superficial reasons. If the handling of the go telemetry mode could be more elegant, that's perhaps unfortunate, but there are other areas of the language and toolchain that could be more elegant.

@findleyr go telemetry on/off could do go env -w GOTELEMETRY=on/off under the hood. go telemetry on/off would then be redundant and pointless, but it would preserve compatibility, while allowing GOTELEMETRY to act like a normal env var. If the redundancy is a concern, then I would point to your words in the quotation above, in which you argue that unfortunate inelegance isn't a priority.

currently, when the user runs go telemetry on, we record the timestamp of when that event occurred, and ensure that we never upload local data from a counter file that overlaps with a period before the go telemetry on event occurred. Being able to set GOTELEMETRY=on in a process environment would make this type of guard impossible, and I don't think we want to give up that guarantee.

Sorry for my ignorance, but I don't follow why a stored timestamp is required. Aren't all telemetry data guaranteed to be safe for upload, since it was only collected when GOTELEMETRY=on? That property can be preserved by putting GOTELEMETRY=local data in a separate place, which is never uploaded.

Assuming that a stored timestamp is required for some reason, ISTM that there could be a separate GOTELEMETRYSTART env var to mark the cut-off time. There's already a second telemetry env var, GOTELEMETRYDIR, so why not a third? When you do go env -w GOTELEMETRY=on, it does go env -w GOTELEMETRYSTART={now} under the hood. If GOTELEMETRY=on, but GOTELEMETRYSTART is unset or in the future, then telemetry is disabled (or, alternatively, there is no cut-off time unless GOTELEMETRYSTART is set).

especially for superficial reasons

My underlying concern is the go tool moving away from the common, simple config model of env vars that can be set per process or in a config file. There's a handy env subcommand to edit the config file. That's it. It's easy to understand, and to use.

Now, there's a writeable "env var" that's listed with the rest of the env vars, but it can't be set with the env subcommand or command-line env vars, and it's not listed in the env var changes. IIRC, this is unprecedented in the go tool.

findleyr commented 2 months ago

Sorry for my ignorance, but I don't follow why a stored timestamp is required. Aren't all telemetry data guaranteed to be safe for upload, since it was only collected when GOTELEMETRY=on?

Telemetry data is written to the local file system by default, when the go telemetry mode is "local". Such data is only for local debugging. When the user enables telemetry, we don't want to upload any data that was collected before the mode was "on". I think this is crucial to explaining why GOTELEMETRY=on can't be settable, because "on" means that telemetry data is uploadable, and we don't distinguish in the stored data whether that data was recorded when GOTELEMETRY=local or GOTELEMETRY=on.

If we were to consider supporting GOTELEMETRY=on as a settable environment variable, I think we'd also need to redesign the telemetry data model to separate counters that were recorded with GOTELEMETRY=local from GOTELEMETRY=on. That's an interesting idea, but beyond the scope of this issue. I also am not sure it would be worth the engineering effort.

ISTM that there could be a separate GOTELEMETRYSTART env var to mark the cut-off time

If we were to support go env -w GOTELEMETRY=on, then we'd need to also need too support setting GOTELEMETRY=on in the process environment, so there wouldn't be a well defined cutoff time.

Now, there's a writeable "env var" that's listed with the rest of the env vars, but it can't be set with the env subcommand or command-line env vars, and it's not listed in the env var changes. IIRC, this is unprecedented in the go tool.

I'm not sure what you mean when you say that GOTELEMETRY is writable. It is currently a non-settable env var that is derived from the telemetry mode file. There are other env vars, such as GOMOD, that aren't settable.

The fact that GOTELEMETRY isn't well documented ~or listed with -changed~ is a bug that should be fixed. (EDIT: we'd only list with -changed if supporting GOTELEMETRY=off).

This discussion raises many good points about how GOTELEMETRY is confusing. I worry that making GOTELEMETRY=off settable would only add to that confusion. As I said in https://github.com/golang/go/issues/68960#issuecomment-2310348522, I think we should first fix #68946 and then see if there is a strong need for GOTELEMETRY=off to be settable.

gopherbot commented 2 months ago

Change https://go.dev/cl/609115 mentions this issue: cmd/go: print the proper error message for setting telemetry vars

willfaught commented 2 months ago

That's an interesting idea, but beyond the scope of this issue.

I'll file a separate issue. Thanks.

If we were to support go env -w GOTELEMETRY=on, then we'd need to also need too support setting GOTELEMETRY=on in the process environment, so there wouldn't be a well defined cutoff time.

It would be defined with the GOTELEMETRYSTART env var:

$ export GOTELEMETRY=""
$ export GOTELEMETRYSTART=""
$ env GOTELEMETRY=on GOTELEMETRYSTART=$date go build # telemetry enabled
$ env GOTELEMETRY=on go build # telemetry disabled
$ env GOTELEMETRY=off GOTELEMETRYSTART=$date go build # telemetry disabled

I'm not sure what you mean when you say that GOTELEMETRY is writable. It is currently a non-settable env var that is derived from the telemetry mode file.

I mean that it's changeable:

$ go env GOTELEMETRY
off

$ go telemetry on
# snip

$ go env GOTELEMETRY
on

There are other env vars, such as GOMOD, that aren't settable.

Are they changeable through the CLI?

findleyr commented 2 months ago

I'll file a separate issue. Thanks.

Thanks for filing that issue. We can continue discussion there.

It would be defined with the GOTELEMETRYSTART env var:

Hmm. I think I'd prefer a model where the "local" and "on" data is separate, as it seems a bit hard to wrap one's head around GOTELEMETRYSTART as an environment variable. I'll say this on the other issue.

Are they changeable through the CLI?

I guess that depends on what you mean by changeable. GOVERSION is perhaps the most complicated, affected by

I think your point is that go telemetry is completely redundant with go env GOTELEMETRY, and if we were to allow GOTELEMETRY=off, it would be (mostly) redundant with go telemetry off, though as discussed above the nature of a per-process environment variable and a global configuration file is different.

Would you feel the same way if we didn't expose GOTELEMETRY as an environment variable, and just exposed GOTELEMETRYDIR?

willfaught commented 2 months ago

Hmm. I think I'd prefer a model where the "local" and "on" data is separate, as it seems a bit hard to wrap one's head around GOTELEMETRYSTART as an environment variable.

That would be my preference too. Having a cut-off timestamp means that GOTELEMETRY=on data is lost if you set GOTELEMETRY=on, then later GOTELEMETRY=local, then later GOTELEMETRY=on again. By placing the two data types separately, no data is lost when you switch modes.

I think your point is that go telemetry is completely redundant with go env GOTELEMETRY, and if we were to allow GOTELEMETRY=off, it would be (mostly) redundant with go telemetry off

Part of my point, yes, agreed.

Would you feel the same way if we didn't expose GOTELEMETRY as an environment variable, and just exposed GOTELEMETRYDIR?

I agree that that would remove the rough edges in go env, go env -w, go env -u, and go env -changed.

However, it wouldn't address the inconsistency of how telemetry is configured (go telemetry on/off) with how the rest of the go tool is configured (go env -w).

rsc commented 2 months ago

Beyond the confusion about an explicit setting and a separate environment variable setting, the other problem with go env -w GOTELEMETRY=... is that every program with telemetry now has to query the go env somehow, which means either they all exec the go command, or they all contain copies of the env-computing code that can go stale.

rsc commented 2 months ago

This proposal has been declined as infeasible. — rsc for the proposal review group

codespotx commented 1 month ago

it seems like a strange coincidence that the anti-feature telemetry is the only aspect that cannot be configured like any other path or aspect (now I need a little tool that creates a little file in a directory that cannot be changed?). this fact could easily be interpreted as creating extra obstacles.

ianlancetaylor commented 1 month ago

@codespotx Perhaps it is a comfort that since the default is for telemetry reporting to be off, the extra obstacle you mention is making it harder for people to turn telemetry on.

codespotx commented 1 month ago

if the default would be "off" there would be no discussion. the default is "local", why did this happen? again room for interpretation.

ianlancetaylor commented 1 month ago

The default is "local" so that people can have a clear idea of exactly what kinds of data will be uploaded if they turn on telemetry. This avoids the two step process of "turn on local mode", then run a bunch of stuff for a while, and then, if it looks OK, turn telemetry on.