Closed klonos closed 3 years ago
This would help us make more educated decisions in issues like #278, #279 etc. instead of having to estimate 80/20 cases (and have people disputing over the percentages).
What is the status of this initiative? Has @quicksketch done any work to track configuration statistics for core yet?
I'd like to track the "Users may log in" setting that was introduced in issue #277. It seems like 95%+ of site would be fine with users logging in with either their username and password, and have no need to restrict it to one or another. Since it may create some confusion (see #1994), we may want to remove that setting in a future version if no one is using it.
@mikemccaffrey that use case and the need to help make educated decisions (instead of doing guesswork) is precisely why this issue here was filed for. It was a thing that greatly bothered me in d.org where decisions were made based on what a group of people thought "most people need/use".
I think that the first thing that we need to determine is what we are going to call this thing that we are building. It seems like when we are describing this functionality, you could use any combination of "statistics", "feedback", "analytics", "logging", or "reporting".
Maybe it would help if we thought about how we are going to present the feature to the end users. What should we ask next to the checkbox to turn it on and off? "Would you like to send anonymous data to backdropcms.org to help inform future product development?"
What do others think? Is there anything in the project module already that does reporting? Should we look there to see what it is called?
Well, if we get technical about it, then we are not "logging" anything. Not on the actual system where the data gathering is to be performed anyways. The logging part will be made on the b.org side, and even then it's not logging, but rather data storing.
Also, "feedback" to me implies user interaction and not something that is done automatically in the background.
The term "heuristics" was suggested over Gitter. (Ancient Greek: εὑρίσκω, "find" or "discover")
...any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision. Examples of this method include using a rule of thumb, an educated guess, an intuitive judgment, stereotyping, profiling, or common sense.
...although it makes perfect sense etymologically, I'm not sure if most people are familiar with the word or what it means.
"statistics", "analytics" and "reporting" make more sense to me, but these words alone do not provide enough context. Something like "Feature Analytics" perhaps?
"Would you like to send anonymous data to backdropcms.org to help inform future product development?"
This sounds really good 👍. Perhaps lose the word "data" because people will start wondering what sorts of data. Better to say "statistics" instead I think.
"future product development" is very accurate, but people care more about "features" rather than the general product development, so how about adding that word into play in order to make it more "luring" to keep that checkbox ticked.
Also, change the order of the purpose and what we are asking, because when people reach half-way through that sentence and all they have read is "send data", they might skip reading the rest of it.
Something like this perhaps:
"Would you like to help making better-informed decisions when adding new product features to Backdrop by sending anonymous statistics to backdropcms.org?"
Note that were are not telling them that we will also be using that information in order to be removing certain features 😈 [evil laugh]
...would also be a great idea to have a "more about this" link that explains what data is being transmitted, the fact that we do not share this information with 3rd parties and more importantly our privacy policy that ensures that the information collected is anonymous and cannot be traced back to the person/site that provides them.
not telling them that we will also (...) removing certain features
I guess, that's really a problem if we suggest it's (only) about "adding new product features".
"Would you like to send anonymous data to backdropcms.org to help inform future product development?"
I love this language. Product development doesn't limit us to adding features, but could include removing some, too.
Can we add a link from this issue to the one where we itemized the things we want to be tracking? (Maybe that one was in Project module?)
We're two weeks away from code freeze for 1.8, and with no code here yet to review or revise it's not likely this feature will get done in time. Bumping to 1.9.
This is something I noticed (in the recent CMS installation comparison video) that Joomla does. Not being at all familiar with Joomla, here's some information I've found that may help in deciding if/how we do this in Backdrop:
My personal opinion is that this would be a good idea, as long as it's done anonymously, and with the users consent (maybe disabled by default?). I also support the idea of linking to a page on BDcms.org specifically discussing this, why we do it, why you can trust us, etc. Maybe even link to the code on Github showing what data we collect?
There's the potential to collect lots of useful information - not just PHP version, Backdrop version, etc., but things like if content revisions are enabled, the site timezone, how often cron runs, etc. (or is that getting too personal?). Also, I like how Joomla provides an API for developers to use that information, giving it back to the community as it were.
Here's what Joomla does:
Stats Collection in Joomla
Since version 3.5.0
Since Joomla! 3.5 a statistics plugin will submit anonymous data to the Joomla Project. This will only submit the Joomla version, PHP version, database engine and version, and server operating system.
This data is collected to ensure that future versions of Joomla can take advantage of the latest database and PHP features without affecting significant numbers of users. The need for this became clear when a minimum of PHP 5.3.10 was required when Joomla! 3.3 implemented the more secure Bcrypt passwords.
In the interest of full transparency and to help developers this data is publicly available. An API and graphs will show the Joomla version, PHP versions and database engines in use.
If you do not wish to provide the Joomla Project with this information you can disable the plugin called System - Joomla Statistics.
...and here's what their publicly available page with the collected stats looks like:
This sounds like a job for @Gormartsen
Before collect and send back to Backdrop statistics, admins or site owners must asked if they want to share their data.
@dyrer yep, that is the point of #3168
As it is now, during installation, we ask people if they want to be checking for available updates. If they say yes, we also collect data. We should not be doing that.
The current proposal is to:
WordPress patches security updated without asking. All versions 4.9.x So I agree always check for updates without collecting data. Administrators should have the option to change their mind after installation. So in my opinion you can have the option during installation but also in options. This options may be located with update options.
I have added a link to the d.org Telemetry initiative in the issues summary: https://www.drupal.org/project/ideas/issues/2940737
...from https://forums.classicpress.net/t/classicpress-1-0-0-aurora-release-notes/910
Admin dashboard. WordPress-specific features like community events and featured plugins have been removed and/or replaced with ClassicPress equivalents. For example, we’ve added a “Featured Petitions” widget to encourage community participation in our development process.
Include an anonymous site identifier when communicating with the ClassicPress updates API (details). ClassicPress can use this to count active sites, but not to identify them individually.
There's no PR here yet, should we bump this issue to the next milestone, or is this something we can get done in less than one month?
I think we may be able to move the current info into a separate module (without changing any functionality) in the next month. But we may need to set aside the addition of any significant new features for 1.14.
The problem here is that we havent decided how this data will be fetched, and I suspect we'll need the PMC to chime in here. This will need to be a specific service on B.org that either fetches this info or is sent this info from sites. Currently Update module just fetches a known feed; that wont work for this noble proposal here.
Once this has been decided then someone can start building the core mechanism to collect and package and send this data to the mothership.
we havent decided how this data will be fetched
I don't think there's any decision to be made here. The data will be collected in the same way project module collects usage statistics now: each site will send information to a service at backdropcms.org.
Today in the meeting we discussed keeping the telemetry data separate from the project data on b.org, because project module is already quite complex, and also because it's unlikely that the telemetry data would be useful as a contrib project.
IIRC Update module doesn't send any data anywhere; it fetches an XML form B.org for each project it wants to check for updates.
Project module on B.org simply counts how many times sites are fetching.
This project may end up using Project/Update to do this work, but the point of my last post is that we will need new (complex) code in multiple locations, contrib and custom on B.org (Project and Borg), along with core code changes, and we'll also need to decide and design how we do this, efficiently. I suppose a single dev could build the Project code and the core code and test them talking to each other, but I think its more reasonable for this to be a joint discussion led by the senior programming leads.
If this is the case then there's definitely too much work to do here for 1.13. Bumping milestone. Also related: https://github.com/backdrop/backdrop-issues/issues/3168
Oh, and I checked, we do have an open PMC issue about https://github.com/backdrop/backdrop-issues/issues/3168, I will add to that a discussion of this issue as well.
Removing the 1.14 milestone, and adding the milestone candidate label. If this issue gets an advocate who wants to push it through the 1.14 release, it can get the milestone back :)
@docwilmot according to https://www.drupal.org/project/drupal/issues/1036780#comment-4970352 it also parses the URL for projects and more recently sub-modules, themes. So if we added things to that URL they would at least be in the logs. Then need to add more parsing at b.org end.
I'm interested in a MVP that adds a couple key items to the URL, such as PHP version and web server. But I do agree that longer term it should be a separate module and not stuff a URL with all the data.
@herbdool you're right, I didn't notice that, its been in core for years it seems. But the code to actually parse the URL isn't in Project.module though?
@docwilmot not clear if d.org is using this patch https://www.drupal.org/project/project/issues/1274766 or if they haven't published it. Perhaps still useful as starting point.
If they are its a private patch, or maybe a custom module. That code isnt in Project.
I agree w/ @docwilmot the main thing needed to move this issue forward is a detailed architecture and a plan.
I'll put one possible architectrue out there to get the ball rolling.
To me this whole thing screams elasticsearch; https://www.elastic.co/products/elasticsearch?ultron=[B]-Elastic-US+CA-Exact&blade=bing-s&Device=c&thor=elasticsearch&msclkid=5325aa35318615d0f4aef72de0066aba
so one possible implementation could be;
telemetry
module for Backdrop core that user can opt into
This approach would also take the datastore and data analysis off the plate of b.org and allow b.org to keep being a good Backdrop site w/out overburdening it.
I'm/we're interested in other architectures, but until an architecture is decided on planning is almost futile. Once we have the tech stack we can create tasks and assign them to interested developers (I count myself amongst that group)
@klonos just shared this interesting link/chart in gitter. https://wordpress.org/about/stats/
...and because the internet never stays the same, here's what that looks like at the moment:
In my non-Backdrop work, I've spent the last month setting up a system whereby Behat tests can be run automatically when code is pushed to a repo (similar to how our PRs are tested automatically here on GitHub).
To do this, I essentially:
I'm no expert, but wouldn't this kind of thing work here too?
Or is this just a simplified version of what @serundeputy already suggested with ElasticSearch? (I'm not familiar with ES, but the name makes me think of Apache Solr, which makes me wonder how its related to collecting telemetry data...)
which makes me wonder how its related to collecting telemetry data
I believe it has to do more with displaying the data and allowing people to search it.
Just received this via email:
Dear GitLab users and customers,
On October 23, we sent an email entitled “Important Updates to our Terms of Service and Telemetry Services” announcing upcoming changes. Based on considerable feedback from our customers, users, and the broader community, we reversed course the next day and removed those changes before they went into effect. Further, GitLab will commit to not implementing telemetry in our products that sends usage data to a third-party product analytics service. This clearly struck a nerve with our community and I apologize for this mistake.
So, what happened? In an effort to improve our user experience, we decided to implement user behavior tracking with both first and third-party technology. Clearly, our evaluation and communication processes for rolling out a change like this were lacking and we need to improve those processes. But that’s not the main thing we did wrong.
Our main mistake was that we did not live up to our own core value of collaboration by including our users, contributors, and customers in the strategy discussion and, for that, I am truly sorry. It shouldn’t have surprised us that you have strong feelings about opt-in/opt-out decisions, first versus third-party tracking, data protection, security, deployment flexibility and many other topics, and we should have listened first.
So, where do we go from here? The first step is a retrospective that is happening on October 29 to document what went wrong. We are reaching out to customers who expressed concerns and collecting feedback from users and the wider community. We will put together a new proposal for improving the user experience and share it for feedback. We made a mistake by not collaborating, so now we will take as much time as needed to make sure we get this right. You can be part of the collaboration by posting comments in this issue: https://gitlab.com/gitlab-com/www-gitlab-com/issues/5672. If you are a customer, you may also reach out to your GitLab representative if you have additional feedback.
I am glad you hold GitLab to a higher standard. If we are going to be transparent and collaborative, we need to do it consistently and learn from our mistakes.
I would like us to have this in mind and not repeat any such mistakes with our implementation.
@serundeputy and I recently talked about putting some focus on this issue. This could be an initiative or I might advocate for this issue, if that would be helpful. But, I need help in figuring out what this means and how to approach it. I am thinking about setting up a special zoom meeting with anyone that wants to talk about this and how to move it forward.
I propose we break this down into the required parts to start, and make some sub-issues:
On backdropCMS:
(would Elastic Search be an option for 3 and 4 above?)
In Backdrop core:
Lots of decisions so far. The code would be secondary I suspect.
Adding this here as an example/idea of how things could look/work in the UI:
The "Read more" link goes to https://code.visualstudio.com/docs/supporting/faq#_how-to-disable-telemetry-reporting
PS: I like how they have a separate "Crash reporting": https://code.visualstudio.com/docs/supporting/faq#_how-to-disable-crash-reporting ...which in our case could be "PHP error and WSOD reporting" or something like that.
I'v got a start on a new telemetry
module for backdrop core collecting:
$data = [
'site_key' => backdrop_hmac_base64($base_url, backdrop_get_private_key()),
'php' => VERSION,
'mysql_type' => 'MariaDB|MySQL',
'mysql_version' => VERSION,
];
update
module techniques@serundeputy please see my comments https://github.com/backdrop/backdrop-issues/issues/285#issuecomment-591096635 and advise if this is realistic or necessary and how the rest of us can participate. I imagine an initiative like this would need multiple moving parts to work together, so we'd need a plan for the rest of the stuff that you're not personally working on. How are we approaching getting all the parts working here?
P.s I assumed you were the initiative lead for this. Dont recall who is.
@serundeputy can we get an update on Telemetry for the weekly meetings?
I'm really interested in helping with this. Do let us know what you need help with @serundeputy 🙂
Thanks everyone!
I've not had any tangible progress on this since the initial PoC module. We need an:
@serundeputy cam you explain what an ES server
is, and why we need one? (not everyone reading / contributing to these issues understands the acronyms).
We also need some direction as to how the rest of us can contribute. Would it be helpful to have people start writing gathering code for all the issues linked in the top post, for example?
It's my understanding from emails with @serundeputy that @serundeputy may not have time to work on this in the near future and I have volunteered to assume some responsibility for moving this initiative forward. Still waiting to confirm this with @serundeputy.
We had a discussion about this with @quicksketch, @klonos, and myself at Backdrop LIVE. I have a Google Doc with a bunch of thoughts in in at my goal is to add this to the issue queue quickly. I am thinking about starting a new meta issue with a clean history and very complete summary of what has been discussed so far.
Here is a link to Google Doc.
I'm thinking about creating a new META issue in the BackdropCMS.org issue queue, since this initiative is broader than just changing core code. It also involved policy decisions and and code on BackdropCMS.org or other locations. But for today, we'll stick with this issue.
But (for now) I started by creating a summary of this issue in the BackdropCMS.org repo WIKI. This is a DRAFT summary of this issue based upon my understanding of where things are at based upon the meeting during Backdrop LIVE on Sept 17 and after reviewing this issue in detail. https://github.com/backdrop-ops/backdropcms.org/wiki/Telemetry-Initiative
My summary assumes that we are NOT using anything like ElasticSearch at this time. We can move in that direction in the future, but during the Backdrop LIVE discussion we decided to start simple and keep the data in BackdropCMS.org database for now.
This is something that may need additional discussion, but I'm working with that assumption for now.
Please, review my summary and ask questions, provide clarifications, and anything else.
Hopefully, we can talk about this at one of the next two dev meetings.
After bringing this up at a DEV meeting, I would like to plan another meeting to just work on this initiative. Please, let me know if you have time and interest to participate in this initiative.
I like this! I read through the summary, and one thing stood out to me as a great idea: hook_telemetry_data()
This was mentioned in reference to contrib integration, but I see this as being the way to implement this feature in core and contrib. Here're some thoughts:
Create a hook_telemetry_data_types()
hook.
This will allow a module (core or contrib) to define the types of data it's going to collect. For example:
function system_telemetry_data_types() {
return array(
'php_version' = array(
'title' => t('PHP version'),
'description' => t("The version of PHP used on this site. E.g. '7.2'."),
),
'mysql_version' = array(
'title' => t('MySQL version'),
'description' => t("The version of MySQL/MariaDB used on this site. E.g. '5.7'."),
),
);
}
Create a hook_telemetry_data()
hook.
This is what will be called on cron/update (or whenever data is collected) and will return the data apropriately. For example:
function system_telemetry_data($data_type) {
$data = NULL;
switch ($data_type) {
case 'php_version':
$data = phpversion();
break;
case 'mysql_version':
$data = Database::getConnection()->version();
break;
}
return $data;
}
Create a report page that lists all the data that's collected, and it's current value. This'll give people an idea of exactly what's being shared, and helps with transparency. For example: | Data | Description | Value |
---|---|---|---|
PHP version | The version of PHP used on this site. E.g. '7.2'. | 7.2.34 | |
MySQL version | The version of MySQL/MariaDB used on this site. E.g. '5.7'. | 5.5.5-10.3.22-MariaDB |
Other ideas:
See new Issue Summary here: https://github.com/backdrop-ops/backdropcms.org/wiki/Telemetry-Initiative
Telemetry: (anonymously) collect useful data so that we can make better-informed decisions about what should go into (or be removed from) backdrop core.
I remember the endless debates of whether a certain setting/module/feature should be on or off by default leading to 300+ long issues in d.o. Here are some related d.o issues:
Metrics collected in the initial implementation:
Other related d.o issues:
Recent d.org Telemetry initiative: https://www.drupal.org/project/ideas/issues/2940737
PR by @docwilmot (based on @quicksketch's work): https://github.com/backdrop/backdrop/pull/3704