Open mschwaig opened 1 month ago
I think that for the most part, any sort of data collection MUST be opt-in, unless we are speaking in aggregate numbers such as "how many people download from the binary cache" or such bulk metrics. I think most issues opened in Nix projects are served well by the information collected by nix-info -m
and potential error output as disclosed by the user at their own pace, with their consent. I think if we did collect crash reports it would likely just be a lot of noise that no one would use.
I think what's sought from crash reporting is actually handled much better through hydra and nixos-vm tests replicating users setups, so we can catch error preemptively. So, in general, I'm strongly against automated analytics. I don't think we win anything through it, and we lose a lot of user trust. I think this is an enterprise'ism that doesn't fit any of the projects under Nix.
Do you think this could be beneficial?
Yes
If so, what data could you imagine being helpful to the project?
Broad usage statistics for experimental features, crashes resulting from nondeterministic/difficult to reproduce bugs, and channel/branch usage in Nixpkgs are the first that come to mind. I think these would help us make better development decisions in regards to things like experimental feature stabilization, additions, removals, etc., as well as assist in UX issues (as described in #109) and help get rid of some of the more niche Nix bugs as they happen
Do you think it could be implemented in a way that is broadly supported in the community?
I appreciate the use of the word "broadly" here, as if it was "all of the community" I would have to say no; but yes, I think we definitely could. It would take a good amount of communication and transparency, there would be many debates over opt-in/opt-out, but in the end I think there is a good middleground for a vast majority of users, as well as developers.
Negative. ;)
Analytics enabled by default on the client would most probably alienate a significant part of users; so we would need it to be opt-in in my opinion.
But by virtue of being opt-in it would loose its value in maybe being somewhat representative on things like feature usage etc. If there even was one! - I'd like to ask proponents of this idea for more specific examples on what data would be used by whom to arrive at what conclusions?
For things like debugging and bug reproduction, I think more work on nix logging capabilities, the recent PRs regarding tracing capabilities and more usage of VM tests could all yield better returns in the short-term then client-side analytics.
I don't think the cost/benefit even make sense, besides the obvious privacy concerns and enterprise-ism.
Collecting metrics like this can be very useful, especially for diagnosing issues that only appear at the population level. I would want any reporting tools to be strictly opt-in, first out of concern for privacy and second because ingesting large volumes of logs from all over the world quickly becomes a nontrivial engineering problem.
Crash analytics on platforms with less support (e.g. darwin) are a major place this data would be useful, and I can imagine others, but I suspect the majority of the really important data we could be gathering should really be done via survey (for example, asking commercial users about which modules they use for their production deployments, and any pain-points they've encountered).
I do believe if there was sufficient justification, limited scope, and full transparency (e.g. the source of the metrics agent is stored in the NixOS GitHub org, available through Nixpkgs, built by Hydra, etc.), we could achieve broad consensus agreement for gathering some kinds of data.
I think that analytics can be useful, however I don't think that it would fit the Nix ecosystem very well. Anything that is privacy-respecting must be opt-in, but the very nature of being opt-in makes it less representative as it removes those who aren't dedicated enough to turn it on and who don't wish to contribute from the dataset. With something as complex as Nix, with the kind of userbase that Nix has, I don't think we would get as useful data out of it. Personally, I think that we could get similar mileage out of community-wide surveys akin to the StackOverflow developer survey and the Python Developers Survey (run by JetBrains). From that, I don't think we should add analytics, but i would not be staunchly against it so long as it is privacy-respecting.
Crash reporting, as implemented by Firefox, Thunderbird, and others, feels unhelpful to me as it seems to gather a large amount of data without very helpful details about it. With that said, I think a feature that could be useful to add would be tools for easily creating GitHub issues when Nix crashes. I'm not sure how it would be implemented (ask every time with a permanent opt-out, opt-in to ask every time, ask on install), but a good template filled in by collected data could make the process easier.
My greatest comparison for a situation where implementing analytics went very wrong with the community was when Quilt Loader added a Monthly Active Users beacon, which was later removed following community concerns.
I think it could be beneficial, and I think theoretically it could be done well. Looking at other candidates answers, I do not think it would be broadly supported by the community. We should probably focus on understanding usage and issues in other ways. We could for example amend the community survey.
I think as a member of the Nix team, @roberth saying that this is not worth it carries a lot of weight.
Question
Question https://github.com/NixOS/SC-election-2024/issues/109 by @iFreilicht brought up this topic, and I thought it was worth discussing on its own, because it is quite a nuanced topic. On one hand it has a lot of privacy implications, on the other hand some specific data might be valuable for improving Nix.
Do you think this could be beneficial? If so, what data could you imagine being helpful to the project? Do you think it could be implemented in a way that is broadly supported in the community?
Candidates I'd like to get an answer from
No response
Reminder of the Q&A rules
Please adhere to the Q&A guidelines and rules