backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Add a Telemetry module to (anonymously) collect useful data #285

Closed klonos closed 3 years ago

klonos commented 10 years ago

See new Issue Summary here: https://github.com/backdrop-ops/backdropcms.org/wiki/Telemetry-Initiative


Telemetry: (anonymously) collect useful data so that we can make better-informed decisions about what should go into (or be removed from) backdrop core.

I remember the endless debates of whether a certain setting/module/feature should be on or off by default leading to 300+ long issues in d.o. Here are some related d.o issues:

Metrics collected in the initial implementation:

Other related d.o issues:

Recent d.org Telemetry initiative: https://www.drupal.org/project/ideas/issues/2940737

Gathering some key data from end-users about how Drupal is used can give the community and the Drupal Association insights that will help us improve the product roadmap, community programs and outreach efforts by the association, and more. Right now, the only data we receive is a very limited amount of data from the sites that call home to Drupal.org for updates information.

Telemetry initiative: Gathering data about Drupal usage

The goal is to gather data about who uses Drupal, what modules they use, what modules they don't use, maybe basic traffic/load information, what php/db versions are in common use, etc... All of this information could be tremendously helpful in setting direction for the project.

We would want to build a modular telemetry system so that we can gather different kinds of data with each major release, if we want to focus on certain areas of the project for improvement.


Use cases we are targeting to improve:

  • Providing aggregate, anonymiZed data to core developers to help them understand real-world usage patterns with Drupal.
  • Providing the Drupal Association with data about the scope of the Drupal community outside of Drupal.org - how many sites are there really? Who are they?
  • Making all of this telemetry sending opt-out.
  • Making sure that headless sites can be identified as Drupal sites underneath by crawlers, and by the d.o metrics system

PR by @docwilmot (based on @quicksketch's work): https://github.com/backdrop/backdrop/pull/3704

jenlampton commented 4 years ago

I like this! I read through the summary, and one thing stood out to me as a great idea: hook_telemetry_data() This was mentioned in reference to contrib integration, but I see this as being the way to implement this feature in core and contrib.

In an effort to control scope, I recommend that we not worry about an API for contrib until we're sure we have a system that will work for our own needs. There aren't any real-world use cases for contrib using telemetry data yet, and until we decide how everything will work, we probably shouldn't be exposing any APIs (they are likely to be changing as we sort everything out).

I'm also a little concerned about the security implications of allowing contrib to harvest data that ends up stored on our own servers... but we can worry about all the legal business if/when we decide to make telemetry available to the rest of contrib :)

edit: I just read the summary, and it does state that contrib use is not in scope (yet):

Can this be used for contrib as well No, we do not expect to make it possible for contrib projects to collect data using this process. At least, not initially. This is something that can be considered in the future.

jenlampton commented 4 years ago

@stpaultim your write-up is fantastic! It's exactly what I had hoped for: a reasonable MVP with clearly stated goals. You also nicely outlined all the various moving parts, with recommendations for each :)

quicksketch commented 4 years ago

Yep @BWPanda we're on exactly the same page.

I put some time in this weekend to create a proof-of-concept module.

There are two parts to it: server and client. I came to the realization that the "server" portion should probably be part of Project module, if conceptually Contrib might collect usage data at some point. Even without considering contrib, it makes sense to include project_telemetry code as part of Project module for consistency with usage statistics (which is in the project_usage sub-module).

So here's an initial stab at things:

Core Telemetry Module: https://github.com/backdrop/backdrop/pull/3379 Contrib Project Telemetry Module: https://github.com/backdrop-contrib/project/pull/50

If you want to test both the server and client, you'll currently have to set it up locally. You can use a single site for both purposes, but here's how to set it up from scratch:

  1. Either check-out or get a Zip of each repository: https://github.com/quicksketch/backdrop/archive/285-telemetry.zip (backdrop core) and https://github.com/quicksketch/project/archive/285-project-telemetry.zip (project module)
  2. Put the entire project module in the modules directory.
  3. Install Backdrop normally.
  4. Enable "Telemetry", "Project", and "Project Telemetry" modules.
  5. You should immediately be able to check what is logged by Telemetry by visiting /admin/reports/telemetry. Right now that's just PHP version and OS. image
  6. Create a sample project to receive the Telemetry information. Visit /node/add/project and create a project with the name backdrop (imitating the project node at https://backdropcms.org/project/backdrop).
  7. Note there is a new "Telemetry" tab on the project page. This is where received, aggregated information would be displayed on backdropcms.org image
  8. To report Telemetry information to the "server" on the same site, find the telemetry.settings.json config file in the config/active directory. Change the config option for destination to your local site like this:
    {
    "_config_name": "telemetry.settings",
    "destination": "http://backdrop.lndo.site/modules/project/project_telemetry/project-telemetry-post.php"
    }

    Using a config override would also work.

  9. Run cron using the admin toolbar. This sends the collected information to the same site as though it were an external server.
  10. Now some Telemetry data is collected in the project_telemetry_raw table, and can be viewed under the Telemetry tab of the project node that was created. image
jenlampton commented 4 years ago

I put some time in this weekend to create a proof-of-concept module.

Hah, so much for not wasting valuable developer resources :)

Okay, so now that this is done how to we prevent contrib from using telemetry_telemetry_info and telemetry_telemetry_data() ? I'll start another issue with the SFC to get the legal stuff going... but they don't usually move very quickly.

ghost commented 4 years ago

Okay, so now that this is done how to we prevent contrib from using telemetry_telemetry_info and telemetry_telemetry_data() ?

How about if we temporarily add a whitelist to the Project module part of this to only accept/store data from core modules?

jenlampton commented 4 years ago

I think you mean an allowlist ;) https://github.com/backdrop/backdrop-issues/issues/4441

I just had a quick chat with @quicksketch and he recommended that we add a checkbox on projects to "Enable telemetry data" (that would also show the tab). Unless someone specifically requests that their contrib module use it -- and a b.org admin turn it on -- we could discard data we received immediately.

This would effectively create an allowlist.

It would be much safer if we could prevent these hooks from even firing for any of contrib, but then that wouldn't really make them hooks... They would need to be translated into something similar that only the core telemetry module could use to send its data.

klonos commented 4 years ago

I like where this is going!! Thanks everyone for your time/efforts so far 🙏

stpaultim commented 4 years ago

I updated the WIKI with some of the thoughts from the last 24 hours. I look forward to discussing this at next DEV meeting.

stpaultim commented 4 years ago

Just getting ready for DEV meeting. The update for this week is that I spent some time getting a local version of BackdropCMS.org working for the purpose of testing this PR. I have the local version working. I'll try to spend more time this weekend testing this issue and updating the next steps.

stpaultim commented 3 years ago

I finally got around to testing the work that @quicksketch did back in October. As a proof of concept it seems to be working. But, in a discussion during our SPRINT today, I think we've decided that there isn't any real chance of making this ready for 1.18.

Here are some screenshots of what we looked at together during the sprint today.

Screenshot from 2021-01-02 15-51-32

screenshot of telemetry data

The significance of the screenshots if that the telemetry modules seems to be sending the data we want, but that the project modules is not yet displaying it properly. Note, that the data includes at least 2 different versions of PHP from 3 different sites, but only PHP 7.2.34 is showing up in the display of the statistics.

stpaultim commented 3 years ago

@klonos - Suggested (during live sprint) that maybe it would be worth trying to get the core portion into 1.18.0 even if it was not enabled by default, for the purposes of testing.

But, there is no consensus that this would really add much value. @quicksketch suggested that for the purposes of testing, it's more important to get a version of the project module working on backdropcms.org or a dev site. Having the module in core may not be worth the effort.

quicksketch commented 3 years ago

Hey Folks, an update after Backdrop LIVE. Thanks for the great feedback and notes that were taken during the session (Notes here on Google Drive).

Reposting some questions from the notes here:

Is the config setting for destination a security risk?

I don't believe so. Update status has also always had a configurable destination. If a malicious user could modify configuration, there are a lot of other much more dangerous values they could change. We have to work from the basis that configuration is accessible only to trusted users.

How long is the data stored (specific to each site)?

Currently: 2 weeks before data is deleted from the "project_telemetry_raw" table.

Do we have any way of preventing collection of data from dev sites?

I've set the Telemetry Unique Identifier to be the same per site key, which is stored in the site "state" table. So it should be shared across all environments, making all environments count as one site. The most recent reporting environment will take precedence in the statistics.

Should we adopt a data module similar to usage stats?

Usage stats are quite complicated in their data model to provide historical data. In this MVP, we're only storing the raw data for now and aggregating it directly from that table. This could be optimized to have permanent aggregated tables, or build per-week/month stats, but that may not be needed in the first version.

I've pushed up changes to both the Telemetry PR and the Project Telemetry PR that includes:

Telemetry core module:

Project Telemetry module (for backdropcms.org):

Next Steps

herbdool commented 3 years ago

@quicksketch is there a PR available to view? I didn't see one linked above.

ghost commented 3 years ago

@herbdool I believe the PRs are:

docwilmot commented 3 years ago

Been testing this, and I think its good for inclusion in core. Any chance of this getting into 1.20?

stpaultim commented 3 years ago

The general opinion (as I understood it) at the weekly DEV meeting last week was that this COULD (in theory) get into 1.20. However, the telemetry module might be disabled by default, but available to users willing to help test the system by enabling it. This would make it easier for us to work on the project module on Backdropcms.org and get it working.

It's not clear to me what still needs to happen to make this possible. I will probably schedule some time this weekend to SPRINT on 1.20 and maybe I can focus on this issue (with help).

herbdool commented 3 years ago

@quicksketch I've reviewed the module https://github.com/backdrop/backdrop/pull/3379 and other than a few small tweaks, looks great.

I've also reviewed https://github.com/backdrop-contrib/project/pull/50 and it looks fine to me. I didn't use a fine-toothed comb but since it's a contrib module (likely used on only one site) it can be iterated on much faster than core.

herbdool commented 3 years ago

@docwilmot I added the "works for me" label based on your feedback.

docwilmot commented 3 years ago

Edit: I was thinking about Project module, not this. So RTBC yes.

herbdool commented 3 years ago

@docwilmot oh. I just added "pr - needs work" though I guess that's probably better since there are few small tweaks.

docwilmot commented 3 years ago

Hehe, youre still right. You PR-ing?

herbdool commented 3 years ago

Not I, just the review.

docwilmot commented 3 years ago

New PR up for review: https://github.com/backdrop/backdrop/pull/3704

klonos commented 3 years ago

Getting this on the radar for 1.20.

domaingood commented 3 years ago

https://www.youtube.com/watch?v=Xja-GbcYqDw Try to look Blesta installer also MODX Revolution. I love to add see in backdrop too add Diretory permission and file permission seeting before installing.

https://docs.modx.com/current/en/getting-started/installation/standard

stpaultim commented 3 years ago

This came up briefly during the meeting today. https://youtu.be/CoXHmSUGsXg?t=3165

quicksketch commented 3 years ago

Thanks folks! I'm super excited that @herbdool and @docwilmot both reviewed the implementation and approved it (with some minor changes)! I'll work on getting the project module portions deployed to BackdropCMS.org and then I think we might be able to merge in this initial iteration!

During the weekly call I suggested that we can use the 2-week period between code freeze and release to determine how/if to expose an option to enable telemetry in the installation process.

I'm marking @docwilmot's PR at https://github.com/backdrop/backdrop/pull/3704 as RTBC.

quicksketch commented 3 years ago

I made the corrections pointed out in @docwilmot's feedback in https://github.com/backdrop-contrib/project/pull/50 and merged that into project module.

klonos commented 3 years ago

Can be a follow up issue, but I'd like the report page to look more like so (emphasize the metric name/description/value instead of which module collects it):

image

...the description column should have the priority-low class, so that on mobile it is hidden:

stpaultim commented 3 years ago

This has been a long time coming...

Nice work everyone.

quicksketch commented 3 years ago

I merged the core PR at https://github.com/backdrop/backdrop/pull/3704 into 1.x for 1.20.0!

After a few minor hiccups with Project module, I also released Project module 2.2.2, which includes the new project_telemetry sub-module. It is now deployed and visible on BackdropCMS.org here: https://backdropcms.org/project/backdrop/telemetry

@klonos I went ahead and made the minor changes you suggested above as well. Here's the Telemetry Reports page as it exists now: image

klonos commented 3 years ago

Thanks everyone for their efforts on getting this implemented! 🎉

Here's a list of follow-up tasks that I could think of that we should ideally get done before the final 1.20 release:

klonos commented 3 years ago

Should we rename the "MySQL version" metric to "Database version" or "Database information" instead? It seems odd to say "MySQL" when the db is MariaDB (although it derives from MySQL, it is a different product), and we should perhaps consider https://www.silkscreencms.org which adds support for other databases.

jenlampton commented 3 years ago

@klonos Adding something new doesn't usually get a change record. I think a blog post might be more suitable :)