bitcoin-core / gui

Bitcoin Core GUI staging repository
https://github.com/bitcoin/bitcoin
MIT License
594 stars 259 forks source link

How to collect usage statistics to make educated design decisions #224

Open maflcko opened 3 years ago

maflcko commented 3 years ago

Currently the gui is designed (or kept as-is) based on guessing how it is used. I'd say that developers and reviewers mostly use their own experience as a proxy how everyone else is using the gui.

It would be good if the gui had a way to record usage statistics, which can be enabled by users that want to opt-in to sharing this. The users could then upload the file to share their report with developers.

I am not sure what ways qt offers to achieve this, but even a plain simple click counter for each button would be better than nothing.

hebasto commented 3 years ago

Could an opt-in log of events / signals be acceptable from the point of view of users' privacy? What events / signals must be redacted from a such log?

GBKS commented 3 years ago

Instead of capturing "all" events, this could also be driven by hypothesis we want to clarify. For example:

I just totally made these up and they might be unrealistic, so ignore the details. My point is that by starting with hypotheses, it's easier to ensure that only relevant data is gathered, the resulting data set is more likely to be useful, and it's much easier for users to understand why this is done and be comfortable sharing.

I'd be more comfortable sharing a data set that includes (totally made up to make my point) lines like "You reuse addresses 3 times on average" and "You typically increase the default fee by 10%" than endless lists of logs.

The privacy aspect is super critical to this. Ideally users can upload the data completely anonymous.

maflcko commented 3 years ago

If the collected statistics are condensed into a short sentence, they could be collected as part of the next survey. Last one: https://achow101.com/2021/01/bitcoin-core-survey

johnsBeharry commented 3 years ago

If this is something that would be considered seriously to include then I think it’s very important to establish the constraints that govern data collection upfront. Generally nothing that can be used to correlate balances, identity, or location.

Off the top of my head, the kinds of sensitive information would be:

Just by themselves usage patterns can be useful, but we will also need to know the environment and configuration of the operator, as it can influence these patterns (features that are removed or renamed). So some additional data points would also have to be collected so the usage data can be segmented. This can also help facilitate debugging, example.

  1. Software Version
  2. Operating System
  3. Machine Specs
  4. UI Configs (exclude language)
  5. …?

Log files are not the most easy to read, so making this data easily consumable by the operator via a UI would allow them to assess if they are comfortable sharing this data, and could possibly be a useful for them depending on what is being collected. Perhaps we learn more about possible monitoring requirements, and roll those in pending the results of the survey.

Question: What would the storage requirements be for such kind of logging?

jarolrod commented 3 years ago

Assuming we abstract away how we will work around consent and protecting privacy. I think a simple solution is to log upon events, then parse the log.

Bosch-0 commented 3 years ago

I agree with GBKS approach of having various hypotheses worth testing. We could decide on hypotheses that focus on collecting data on new features added with that version, for example how users use descriptors in 0.21.0 release. These could be mixed in with some more general meta hypotheses such as what kind of wallets users use / how users construct tx's etc. The results of achow's survery will be helpful in deciding what hypothesis we test. All testing should be opt in and be anonymous as possible.

The opt in should be presented via on-boarding when using the GUI for the first time or via an overlay on first launch when updating to a newer GUI version. The info in the GUI should give a rough overview of what is being collected with a link to more in-depth documentation detailing what / why / when data is being collected (hosted on GitHub / bitcoincore.org?).

Would also be handy to have an opt in option in the settings if the user changes their mind and wants to contribute data later on.

michaelfolkson commented 3 years ago

I know this has been said before but if this is enabled this must always be opt-in. It should be documented clearly that this must never be made opt-out. I am sure some people will argue against this for the slippery slope reason. That if we introduce this, there is the risk (either intentionally or unintentionally) of this being made opt-out at some point in the future.