backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Add hook_field_schema_alter() #6208

Closed herbdool closed 8 months ago

herbdool commented 1 year ago

Description of the need

Add hook_field_schema_alter(), as described in https://www.drupal.org/node/691932, which has been merged into Drupal 7 core as of 7.99. See https://www.drupal.org/node/3406312

For the reasons we have hook_schema_alter() - mainly site-specific tweaks, we should allow the same for field storage.

It would make it easier to implement a contrib module (like date_repeat) that would alter core fields (like core date fields).

Proposed solution

Apply this patch: https://www.drupal.org/files/issues/2023-03-16/drupal-field_schema_alter-691932-129-D7.patch or similar.

Alternatives that have been considered

hook_schema_alter() can be used too but it requires checking for each table to see if it's of the right field type and also requires also handling field creation and deletion so it's more work than the proposed new hook.

Example implementation

/**
 * Allow modules to alter the schema for a field.
 *
 * @param array $schema
 *   The schema definition as returned by hook_field_schema().
 * @param array $field
 *   The field definition.
 *
 * @see field_retrieve_schema()
 */
function hook_field_schema_alter(&$schema, $field) {
  if ($field['type'] == 'image') {
    // Alter the length of a field.
    $schema['columns']['alt']['length'] = 2048;
    // Add an additional column of data.
    $schema['columns']['additional_column'] = array(
      'description' => "Additional column added to image field table.",
      'type' => 'varchar',
      'length' => 128,
      'not null' => FALSE,
    );
    // Add an additional index.
    $schema['indexes']['fid_additional_column'] = array('fid', 'additional_column');
  }
}
argiepiano commented 9 months ago

I've done some light testing and seems to work fine. I was wondering about the performance impact of looping through all modules and loading their install files (which is basically what module_load_all_includes() does) every time field_retrieve_schema() is called. This is being called very often, for example, every time a node is created.

Can we get any measurements of this? I tried installing the webprofiler module, but I need to install XHProf to run it, and currently don't have the time to figure out how to do that with MAMP. Wondering if anyone using a different tool like ddev or lando are able to test this (for example, after enabling tons of modules).

herbdool commented 9 months ago

Performance was brought up here: https://www.drupal.org/project/drupal/issues/691932#comment-15073273 as well:

Before the patch, these functions were loading only one .install file - for the field. After the patch, all .install files are loaded (for obvious reasons), so that any alter hook is found and run.

This can have a minor performance effect, but taking into account a fact, that this code is mostly run on field CRUD operations (insert, update, delete) and also in field_read_fields(), where the field definition output is cached in cache_field table most of the time, this should be very minor and not a problem at all.

laryn commented 9 months ago

I tried a quick and dirty comparison: on a sandbox site enable a whole string of modules (about 50) in one swoop on the command line, then load the xhprof results for that process: CleanShot 2023-12-19 at 05 45 12@2x

Next delete the local snapshot and reinstall. Drop in replace the entire core directory with the one from this PR. Clear caches and enable that same string of modules on the command line. Load the xhprof results: CleanShot 2023-12-19 at 05 49 28@2x

If I'm reading that right the total number of function calls went up but the time went down. (This was one run so can't rely too heavily on that but it does seem like it's not a huge performance drain at a minimum). I ran the second one again just to get another set of numbers and it fluctuated a bit from the first: CleanShot 2023-12-19 at 06 10 36@2x

Pulling out a few specific lines that may be of interest from that last run: CleanShot 2023-12-19 at 06 04 23@2x CleanShot 2023-12-19 at 06 05 24@2x

I'd say that verifies the performance impact to be minimal.

argiepiano commented 9 months ago

@laryn, thanks. I was wondering if you could test the impact when creating a node, with and without the PR? Create a page node, as field_retrieve_schema() is only called when the node title is automatically inserted in the menu bar. I'm seeing 6 calls to field_retrieve_schema() for each page created (with the subsequent looping and loading of all enabled modules' installs every time).

herbdool commented 9 months ago

@argiepiano it seems to happen with any menu item added. It looks like it's admin_bar_links_alert(), which is calling system_get_requirements(), which gets triggered. On a menu cache rebuild? Even without this PR, that would have an impact if someone was adding a lot of menu items at once, and it is rebuilding the menu cache for current user who has permission to administer site configuration (see admin_bar_output()). That's probably rare though. So this seems like a Backdrop-specific issue. Is this acceptable? I think it probably is.

laryn commented 9 months ago

@argiepiano Here's a quick test of creating a piece of page content (with just a title and body field):

WITHOUT the PR: CleanShot 2023-12-21 at 11 19 31@2x

WITH the PR: CleanShot 2023-12-21 at 11 20 15@2x

argiepiano commented 9 months ago

Thanks. @laryn. That seems like a significant performance impact

herbdool commented 9 months ago

@argiepiano Well, we're talking 0.353 seconds versus 0.086 seconds. I'm assuming this is for someone with the administer site configuration permission? @laryn How many modules were installed on the site? How many pages with menu items could conceivably be created in one process? To me it seems like an edge case. I'd like to see the comparison with a user who doesn't have administer site configuration permission to see if it's all because of admin_bar_links_alert() or something else too.

argiepiano commented 9 months ago

@herbdool in absolute term, the impact of this particular test is minimal. In relative terms, it's large. This process took 4x longer.

Can we brainstorm ways to minimize this impact? I'm sure we can optimize these processes.

herbdool commented 9 months ago

@argiepiano we'd have to know more about @laryn's setup in that test. If I'm right that the impact is because of admin_bar_links_alert() being called during a menu cache rebuild then perhaps that's the focus. What if admin_bar_links_alert() result is also cached and then not rebuilt each time along with the menu cache rebuild?

herbdool commented 9 months ago

I'd like to know more about @laryn's setup for testing the performance. What user created the page?

@argiepiano I'm not convinced we'd be able to "optimize the processes" much. The drupal issue has been around for over a decade before merging. That's why I was trying to see if the calling of admin_bar_links_alert() could be minimized. I'm not sure if that's something that should be done in this PR. I'm not sure why the admin bar needs to be rebuilt when a page menu items is created. I haven't figured that out yet. It's not even the same menu.

argiepiano commented 9 months ago

I haven't had time to look into this. The fact that it was in the D7 queue for a decade is telling in many ways... but let's look more in detail into this. Hopefully next week I'll be able to help here.

argiepiano commented 9 months ago

Just spitballing here: why not restrict the use of hook_field_schema_alter() to the module files, like most alter hooks? This would skip the big issue of loading all the install files every time field_retrieve_schema() runs?

herbdool commented 9 months ago

@argiepiano I've concluded that this hook is not the problem. We introduced in https://github.com/backdrop/backdrop-issues/issues/5464 a situation where field_read_fields() gets called frequently when it should only be called when field_info_fields() shouldn't be used. The latter is cached and thus much better when viewing, adding, updating content. The former should only be used when viewing or editing field config, or when viewing the status report.

Whenever admin_bar_links_alert() is called it will call field_retrieve_schema() anyway. admin_bar_links_alert() calls system_get_requirements() which triggers system_rebuild_module_data() (for checking runtime info). That calls field_system_info_alter() which calls field_read_fields() and finally field_retrieve_schema().

So even if the hook is moved to the .module file it probably gets called.

The preference in D7 was to allow it to be in .install files. See https://www.drupal.org/project/drupal/issues/691932#comment-7971477. I suspect to keep it consistent with hook_schema_alter() (see comment_schema_alter() for an example).

I confirmed the bigger issue by toggling $settings['disable_multiple_modules_warnings'] = TRUE; setting.

I believe we just weren't aware of how often admin_bar_links_alert() gets called when adding all this functionality. It's not just this alter hook getting called when adding menu items that's a problem, but rebuilding the module data as well.

I think we need a new issue. Not sure if that also means it's a blocker for this one.

herbdool commented 9 months ago

@argiepiano I created https://github.com/backdrop/backdrop-issues/issues/6353.

in absolute term, the impact of this particular test is minimal. In relative terms, it's large.

I still think the impact in practice is not that large (with vs without this PR). Someone is unlikely to create hundreds of menu items in one PHP process, while using the UI. I believe that's what's required in order for the "relative terms" to add to something big.

herbdool commented 9 months ago

Now that https://github.com/backdrop/backdrop-issues/issues/6353 has been merged, @laryn can you test the creation of a page with menu item? Thank you.

argiepiano commented 9 months ago

@herbdool thanks. Glad to see the other fix got merged. I'll do some testing tomorrow.

argiepiano commented 9 months ago

@herbdool, I've tested, and now, as expected, field_retrieve_schema() and the new alter hook are only called in specific pages dealing with module enabling/disabling, field adding/removing, changing node display settings and running updates. That's a lot better than before the admin_bar fix that was just merged, since this used to run every time you added content.

I'd be interested to know how much of an impact this has when making changes in "Manage display" for entities when there are lots of different fields. I've worked with sites with tons of fields and custom entities, and I always dreaded making changes to displays (e.g. reordering fields). Both in Backdrop and D7 they take an inordinate amount of time, as there are multiple processes and cache clearing going on each time you save the changed display. This patch adds more stuff to those processes, as the new alter hook is called for every field in the system when you save - which means looping through every enabled module for each field in the system, to load the install files.

And playing devil's advocate here: what do we need this alter hook for? Is the benefit outweighing the cost? AFAIK this need came out from a port of a custom module that adds a column to image fields, is that right? Is this a common need? Can't this need be fulfilled in a different way (e.g. with a separate pivot table for the field table)?

laryn commented 9 months ago

I will try to make time to test with xhprof again today or tomorrow if I can squeeze it in.

My personal take on it is that since it is included in Drupal 7, we should include the change as well for backwards compatibility, although taking into account (and tweaking where we can) performance implications.

EDIT (off topic aside): I've also seen sites where those field/display pages take forever to load, and found that a large portion of the issue in my case was tokens, which Fast Token Browser alieviated. +1 for FTB in core in my opinion...

herbdool commented 9 months ago

@argiepiano like @laryn mentioned, it's now included in Drupal 7, so let's make sure it still works in Backdrop. But it is also required for contrib modules like https://github.com/backdrop-contrib/date_repeat. This was also an issue for Date module in Drupal 7, see https://www.drupal.org/project/drupal/issues/691932#comment-10004877:

Incidentally, the Date project refactored the code for repeating dates into a separate module, Date Repeat, but because there's no hook_field_schema_alter(), they're forced to handle the schema for Date Repeat in date.install.

Also incidentally, the site where I need this also has a ton of fields. Almost 400 fields! And it also has Fast Token Browser installed. So I'll see if I notice any lag in the field config UI (compared to before).

argiepiano commented 9 months ago

Sounds good to me!

herbdool commented 9 months ago

I used Devel with query log, page timer and memory usage enabled. Before and after applying the patch.

For a content type with 35 fields. Edited regular text field.

Before:

After:

--

Content type with 71 fields. Edited a paragraphs field.

Before:

After:

There are a lot of things impacting performance, but I'm not sure if we'd be able to say that this PR had an impact that's outside a standard deviation. That's my guess. There are bigger things affecting query time and page execution time that bury this PR.

I also don't know if the PR makes a difference depending on the type of field. With any field it will load all the installation files while saving.

In my test I also had date_repeat enabled which calls the alter hook. That didn't seem to make much difference either.

laryn commented 9 months ago

Before (latest dev): CleanShot 2024-01-04 at 13 45 16@2x

After (updated branch on PR, cleared cache): CleanShot 2024-01-04 at 13 43 02@2x

argiepiano commented 9 months ago

Thank you both. The numbers look good. This is LGTM

argiepiano commented 9 months ago

Since we are beyond the 1.27.0 freeze, should this be added to 1.28.0?

herbdool commented 9 months ago

I added it to 1.26.4 milestone. So long as others feel it's okay for a bug/task fix. It won't do anything unless someone calls the alter hook.

quicksketch commented 8 months ago

I committed https://github.com/backdrop/backdrop/pull/4513 and added two @since docblock lines in https://github.com/backdrop/backdrop/commit/8e90bee281caadb5ded23db5e0c412d1209d2456.

As a missing Drupal 7 hook, API breaks between Drupal 7 and Backdrop can be considered a bug, so there's no need to wait for a feature release for this API.

Thanks @herbdool, @laryn, @argiepiano, and @bugfolder for your work on this issue!