backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

Fix mysql/mariadb version for Telemetry and the Status Report #5447

Closed indigoxela closed 2 weeks ago

indigoxela commented 2 years ago

Description of the need

We capture the database type and version, but the outcome doesn't seem correct. It's at least questionable, as I'd expect MariaDB to be used more often.

wrong-db-stats

Proposed solution

Seems like we'd need to evaluate the version string more accuratly.

Additional information

Some examples from actual strings on the status page, by OS:

All MariaDB versions start with 5.5.5, so I think, all those get counted as MySQL 5.5. My suspicion: the majority in our stats are actually MariaDB.

I didn't find an existing issue for that, please point me to it if there is.

Currently the Telemetry module does a super-simple check. We could either fix it there by enhancing the code, or do better evaluation server-side.


Later Note: Even Mariadb versions starting with 10 are sometimes reported as 5.5 (as on Pantheon). See https://github.com/backdrop-ops/backdrop-pantheon/issues/68#issuecomment-1433751043

klonos commented 2 years ago

On a fresh installation of Backdrop 1.x, I see the full version of MariaDB under admin/reports/telemetry:

image

I'm assuming that the problem is with the way we trim/aggregate that data on b.org, after it's been sent from each site.

olafgrabienski commented 2 years ago

I get the two same values on admin/reports/telemetry and admin/reports/status. Example:

5.5.5-10.2.36-MariaDB-10.2.36+maria~jessie

indigoxela commented 2 years ago

I'm assuming that the problem is with the way we trim/aggregate that data on b.org

That's also what I'm assuming. The disadvantage with that is that only @quicksketch can fix it, as that code isn't available open source.

I get the two same values on admin/reports/telemetry and admin/reports/status

In both places the same simple version check is done. For the status page that seems fine, but for telemetry we'd probably need a slightly smarter check. Or maybe it should be smarter in both places?

klonos commented 2 years ago

I did some research and there's no reliable way to extract just the mariadb portion of the version out of the version string that is reported by MySQL/MariaDB.

Some people try to do some kind of "feature detection" in order to tell if it is MariaDB and which version, but all the solutions I found around the internet seem hacky to me. It would be awesome if mariadb provided that out of the box somehow, but that is not the case.

indigoxela commented 2 years ago

there's no reliable way to extract just the mariadb portion of the version out of the version string that is reported by MySQL/MariaDB.

The string may depend on how the software has been compiled (?). It might still be possible to extract the info. The examples I collected seem to indicate, there's some consensus across Linux distributions, how to handle it.

Here's a little standalone php script to play with:


<?php

$foo = array(
  '5.5.5-10.3.31-MariaDB-0+deb10u1',
  '5.7.36-1',
  '5.5.5-10.5.12-MariaDB-0+deb11u1',
  '5.5.5-10.3.28-MariaDB',
  '5.5.5-10.5.11-MariaDB-1:10.5.11+maria~focal-log',
);

foreach ($foo as $string) {
  $type = $version = '';

  if (strpos($string, '-MariaDB') !== FALSE) {
    $type = 'MariaDB';
    preg_match('/(\d+\.\d+)\.\d+-Maria/', $string, $matches);
    $version = isset($matches[1]) ? $matches[1] : 'unknown';
  }
  else {
    $type = 'MySQL';
    preg_match('/^(\d+\.\d+)\.\d+/', $string, $matches);
    $version = isset($matches[1]) ? $matches[1] : 'unknown';
  }

  echo $version . '-' . $type . "\n";
}

?>

If we do something like that, and where we do it (module, server-side) is somebody else's decision. :wink: Whatever it will be, I hope we can do a better job re the stats - nothing's as worthless as wrong data.

klonos commented 2 years ago

I agree about doing a better job to be able to extract a more accurate db version 👍🏼

I don't think that the data is worthless though. The first set of the 3 digits in every version (regardless of whether it is MySQL or MariaDB) indicate the level of parity/compatibility with the "upstream". Both 5.5.5-10.3.31 and 5.5.5-10.5.12 for example indicate two different versions of MariaDB, which are both drop-in replacements for MySQL v5.5.5. Saying that they both are MySQL v5.5 "equivalent" is accurate I think, so perhaps a better wording in how we represent the data in the table might solve things.

I'm wondering how we are getting the 10.x numbers though 🤔 ...are there OSes that are reporting the "proper" version?

indigoxela commented 2 years ago

perhaps a better wording in how we represent the data in the table might solve things.

Sure, if that makes people happy, why not. I don't think it's accurate data, though. Not knowing, how many of these "MySQL 5.5" are actually MariaDB (and which version!) makes that number less useful for code decisions.

I'm wondering how we are getting the 10.x numbers though

Some specially compiled docker versions? Or maybe it's completely wrong. :stuck_out_tongue_winking_eye: I couldn't tell.

klonos commented 2 years ago

The disadvantage with that is that only @quicksketch can fix it, as that code isn't available open source.

Of course it is! 🙂 ...this commit shows that it's all done via a new project_telemetry submodule of v2 of the https://github.com/backdrop-contrib/project module. Here's what I see in b.org for core:

image

Not knowing, how many of these "MySQL 5.5" are actually MariaDB (and which version!) makes that number less useful for code decisions.

Right now, we seem to be saving only a numeric value for the version number. I guess we'd need to change the data we capture to a string that indicates MariaDB vs MySQL (it's easy to grep the maria portion of the version string to determine that) + version. I got that working as expected on my local with the following:

      $version = Database::getConnection()->databaseType() === 'mysql' ? Database::getConnection()->version() : t('unknown');
      // Backdrop core only supports MySQL and MariaDB. If there is no "maria"
      // in the version string, assume it is MySQL.
      $type = strpos(strtolower($version), 'maria') !== FALSE ? 'MariaDB' : 'MySQL';
      // Grab all strings within the version that are a sequence of numbers and
      // dots.
      preg_match_all('/[\d\.]+/', $version, $versions);
      if (count($versions[0]) == 1) {
        $version = $versions[0][0];
      }
      else {
        // Some variations of the MariaDB version string contain the version
        // number more than once. For example:
        // 5.5.5-10.5.11-MariaDB-1:10.5.11+maria~focal-log
        $versions = array_unique($versions[0]);
        // Sort versions in reverse (descending) order.
        rsort($versions, SORT_NATURAL);
        // Get the greatest version (or the only version).
        $version = $versions[0];
      }
      return $type . ' v' . $version;

image

I think that we should be trimming the version number to only show major/minor, as I don't think that grabbing the dozens of bug fix versions is helpful, and it would make the table huge too.

indigoxela commented 2 years ago

...this commit shows that it's all done via a new project_telemetry submodule...

I see. @klonos do you suggest to make the changes there? Then we should open an issue in the b-org queue and close this one. Or should we make changes before data get sent? Might also need some adaption in project_telemetry, though.

yorkshire-pudding commented 2 years ago

Mine doesn't have the 5.5.5 - just the mariadb bit - this is on shared hosting

10.2.41-MariaDB-log-cll-lve image

I guess I was the one with 10.2, though its now disappeared.

klonos commented 2 years ago

...do you suggest to make the changes there?

I think we could fix the version reporting (10.x vs 5.x vs 8.x) in core, and leave decision of whether to report type (MariaDB vs MySQL) separately for a later time. This PR does that: https://github.com/backdrop/backdrop/pull/3903 ...this one reports "clean" and accurate 3-digit versions (no type or OS or other things). The trimming/aggregation of the versions happens in b.org (according to whatever the setting for the mysql_version value is set in the project_telemetry module for the Backdrop core project node).

Here's another PR that reports the db type as well: https://github.com/backdrop/backdrop/pull/3904 ...this one makes the reported value sent to b.org a string (type + version), so the version number decimals/accuracy cannot be handled by the current trimming/aggregation options provided by the project_telemetry module. That's why the reported value is trimmed down to only 2 digits in core/modules/telemetry/telemetry.telemetry.inc (before it gets sent to b.org).

Thoughts?

indigoxela commented 2 years ago

Thoughts?

I think there's benefit in also having the DB type, but suspect, this needs some adaptions on the project_telemetry side.

Re doing it (or parts) in core telemetry module or doing all server-side I'm on the fence. I see pros/cons in both.

Doing the version extract before sending means that only relevant data get sent and saved. Doing everything server-side means that evaluation is done on one point, which might make things easier to maintain.

yorkshire-pudding commented 2 years ago

Thoughts?

I prefer #3904 - I think making it clear the DB type would help reduce any initial perception (for those who don't understand the nuances of MySQL/MariaDB versioning) that most people are running Backdrop on older unsupported software.

ghost commented 2 years ago

I think raw data should be sent from Backdrop sites, then analysis/formatting done on the B.org side. This way, any further changes we want to make are possible without invalidating all previous data collected.

So I think this issue should be closed in favour of a B.org/Project module one.

indigoxela commented 2 years ago

So I think this issue should be closed in favour of a B.org/Project module one.

Looking at the code (and the screenshot by @klonos), I'm assuming that currently the project_telemetry module only provides a super simple "Aggregation precision" setting for display. Nothing smart at all that could handle more complex string evaluation we'd need for proper database type display. So I'm not sure if it even makes sense to open an issue in the backdrop-contrib/project queue.

jenlampton commented 1 year ago

I just ran into this same problem in https://github.com/backdrop-ops/backdrop-pantheon/issues/68#issuecomment-1433751043, where the status report was reporting the wrong version also.

Here's the relevant issue for Drupal 8 drupal.org/project/upgrade_status/issues/3213533

my understanding is that D9 uses SELECT VERSION() but lower than that relies on PDO which goes through a sqlproxy that always reports back 5.x

Maybe we can update both the status report and telemetry reporting at the same time.

jenlampton commented 1 year ago

I've filed a new PR based on your second PR @klonos, the one that inluded the databse name as well as version. I abstracted out the code you used to get the version, and moved it to a system function that could be used by both the status report and telemetry. It's working nicely on my local :) thank you!

jenlampton commented 1 year ago

Well, I don't think the current approach is going to do the trick. MariaDB 10.6 is still being reported as 5.5. I think we should review the approach taken by Drupal in drupal.org/project/upgrade_status/issues/3213533 to see if that can get the correct database version.

klonos commented 1 year ago

I abstracted out the code you used to get the version, and moved it to a system function that could be used by both the status report and telemetry.

Good call 👍🏼

...inluded the databse name as well as version.

Well, I don't really like how the status page reports this:

MySQL, MariaDB, or equivalent version MariaDB v10.8

If we are confident that it is either MariaDB or MySQL, we should be reporting that. We should only be using the lengthier MySQL, MariaDB, or equivalent version string if we don't know which one of the two it is. So the possible strings for $type in the site status report should be:

MariaDB 10.6 is still being reported as 5.5.

Hmm, odd 🤔 ...I'll set that up on my local and try to figure out what the deal is.

I think we should review the approach taken by Drupal in https://www.drupal.org/project/upgrade_status/issues/3213533 to see if that can get the correct database version.

Definitely 👍🏼 ...we shouldn't be trying to reinvent the wheel.

klonos commented 1 year ago

@jenlampton I've set up your PR branch on my local (using lando if it matters), and it seems to be reporting the 10.6 version correctly for me:

image

image

klonos commented 1 year ago

...FTR, the PR sandbox reports v10.8: image

klonos commented 1 year ago

...if it helps, I've inserted multiple dpm()s in various stages in the newly-introduced _system_get_database_version() function, and here's the output: image

quicksketch commented 2 weeks ago

On my local (using DDEV), I get this odd version string reported in PHP: 10.11.8-MariaDB-ubu2204-log. (ubu2204 translates to "Ubuntu 22.04", the OS version I'm running). This PR then reports that the database version is 2204. This is a tough problem!

quicksketch commented 2 weeks ago

I adapted the fix from https://www.drupal.org/project/upgrade_status/issues/3213533 into a new pull request here: https://github.com/backdrop/backdrop/pull/4853

The thing I didn't understand right away is that the 5.5.5- prefix is not the equivalent MySQL version, it's a hack that allows MySQL clients to connect to MariaDB servers. See the full explanation in the MariaDB project: https://github.com/MariaDB/server/blob/f6633bf058802ad7da8196d01fd19d75c53f7274/include/mysql_com.h#L42

The good news is this makes "fixing" the version number a lot easier than it seemed at first. The prefix is always going to be 5.5.5-, never any other version number like 5.6.0- or 8.1.2-.

Although the location of the fix is different, my new PR uses the identical fix as Drupal, which simply strips off this prefix.

We'll still be reporting longer version strings to Telemetry, like 10.11.8-MariaDB-ubu2204-log or 10.5.11-MariaDB-1:10.5.11+maria~focal-log, but all those strings are fine as long as the first 3 numbers indicate the MariaDB version. The project_telemetry module will just strip off all the unnecessary suffixes.

indigoxela commented 2 weeks ago

@quicksketch cool, that means, this issue gets obsolete, right? (Another "read before post" reminder to myself :wink: )

Testing seems tricky... on my dev there's no such prepended string. But I do have one on a low traffic (never gone live) site. :thinking:

indigoxela commented 2 weeks ago

Works for me! :+1:

I have a "forgotten web" on a server with the MariaDB package directly from their repo, so there's the prefix.

Before applying the patch the version on the status page reads:

MySQL, MariaDB, or equivalent Version 5.5.5-10.6.19-MariaDB

After applying the patch:

MySQL, MariaDB, or equivalent Version 10.6.19-MariaDB

Same for "Debug information" on admin/reports/debug - prefix gets properly stripped.

Additionally I tested on my dev box (guess the OS :wink:): Before and after values are the same, as expected:

10.11.6-MariaDB-0+deb12u1

The code also looks good. Not sure if anyone wants to do some additional code review. So setting RTBC right away.

quicksketch commented 2 weeks ago

Merged into 1.x and 1.28.x. Thank you @indigoxela, @klonos, @olafgrabienski, @yorkshire-pudding, @BWPanda, and @jenlampton!