backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

`install_find_translation_files()`: The regex for the `$mask` parameter when calling `file_scan_directory()` is unnecessarily complex and outdated #6662

Open klonos opened 1 month ago

klonos commented 1 month ago

I noticed this while working/reviewing #6659 and #6660.

This is the code in question:

function install_find_translation_files($langcode = NULL) {
  $directory = settings_get('locale_translate_file_directory', conf_path() . '/files/translations');
  // @todo: Remove the "drupal" check in 2.x (assuming we have a localization
  //   server by then).
  $files = file_scan_directory($directory, '!(install|drupal|backdrop(cms)?)(-[\d\.]+)?\.' . (!empty($langcode) ? preg_quote($langcode, '!') : '[^\.]+') . '\.po$!', array('recurse' => FALSE));
  return $files;
}

The second parameter in the file_scan_directory() is this: '!(install|drupal|backdrop(cms)?)(-[\d\.]+)?\.' . (!empty($langcode) ? preg_quote($langcode, '!') : '[^\.]+') . '\.po$!'

  1. For starters, the checks for install, drupal, and backdrop seem like things that we put in place before we had our own localization server to allow producing our own .po files. They should be removed I think. Unless we want to allow .po files like drupal-7.101.de.po. In any case, install is not used, and backdrop is not something we are generating - we use backdropcms instead (as in backdropcms-1.28.de.po for example).
  2. Then, there is no check of whether the provided language code is valid, which could be done via standard_language_list() (which we can expand independently in the future as more language codes are being added - see #6661 for example).
  3. Then also, our https://localize.backdropcms.org server provides .po files that are generated in the [PROJECTNAME]-[VERSION].[LANGCODE].po format, in which:
    • [PROJECTNAME] is the machine name of contrib projects, and for core it is the string backdropcms specifically
    • [VERSION] is a regular semantic version string (for which there is an official regex provided here that we can use by the way)
    • [LANGCODE] is the language code. This is typically a 2-letter string, but can also have dashes and more than 2 letters. Some common non-typical language codes are for example:
      • with dashes: en-gb, pt-pt/pt-br, as well as gsw-berne, xx-lolspeak, and zh-hans/zh-hant
      • more than 2 letters: mfe, prs, sco etc.

Noting the following: If any non-standard language codes or languages are added, they will need to be also added to the output of standard_language_list() and other relevant functions in order to be properly supported by core (in the language selection lists/selects) as well as by our localization server. So made-up-langicode can in theory be added as [LANGCODE] in the filename of a .po file, but that file will then not be produced by what we currently have in place in https://localize.backdropcms.org. So although not officially used or generated, we do need to allow them to be specified and used, for testing purposes and for things like the specific feature in core that allows for custom languages to be added:

custom language in Backdrop CMS

With all the above points and notes in mind, I think that the regex used for the $mask parameter in the specific file_scan_directory() call can be simplified and future-proofed to something like the following (rough regex mockup, that incudes placeholders instead of actual regex for now): '!(drupal|backdropcms)(-SEMVERREGEX\.' . LANGCODEREGEX . '\.po$!'

klonos commented 1 month ago

@docwilmot explained the following in https://github.com/backdrop/backdrop/pull/4834#discussion_r1693273539

We accept:

  • install.ca.po (used in tests)
  • backdropcms-1.28.ca.po
  • backdrop-1.x.ca.po
  • drupal-7.x.ca.po
  • backdropcms-1.x.ca.po (this is the reason for the PR change. If this is the http request, the localize site serves the latest langauge file but named as 1.x)

So the install bit does have a purpose after all and should stay (and we should add an inline comment to explain that for anyone wondering about it in the future).

As I mentioned in comment there:

PS: I'm not sure what the '[^\.]+' is meant to catch there, since ^ signifies the beginning of a string in a regex. ...

argiepiano commented 1 month ago

PS: I'm not sure what the '[^.]+' is meant to catch there, since ^ signifies the beginning of a string in a regex. ...

Not quite. It's negation. https://www.squash.io/how-to-use-the-regex-not-operator-in-programming/#:~:text=To%20use%20the%20negation%20operator,%E2%80%9D%2C%20or%20%E2%80%9Cc%E2%80%9D.

klonos commented 1 month ago

Right. It seems my regex is getting rusty. So [^\.]+ is apparently any character but a period.