Islandora / islandora

Drupal modules for browsing and managing digital repositories.
http://islandora.ca/
GNU General Public License v2.0
152 stars 118 forks source link

[BUG] IslandoraUtils.php#getReferencingMedia very slow in certain circumstances #1055

Open hading opened 1 month ago

hading commented 1 month ago

Strictly speaking this isn't so much a bug as a performance issue, and one that may only occur in certain circumstances. That being the case I'm not sure how interested you'll be in it, but we thought it'd be useful to note.

We were regenerating service file derivatives and it was going quite slow. Eventually we tracked this down to the query being generated by the IslandoraUtils.php#getReferencingMedia function being very slow.

For context, our repository has approximately 500K nodes and 2M media. We also use the group module for certain permissions. We're basically running ISLE (~3.2 or so) on a plenty resourceful machine - 16 CPUs and 48G memory.

The function generates queries like the one I've included at the end of the ticket.

This query takes about 60-90s to run in our set up. And of course when doing this mass regeneration of derivatives one like it runs for every single file in a post save hook.

While we're surprised that the mariadb query optimizer doesn't handle this a bit better (and everything that seems to need a database index here is indexed), it doesn't for us.

We have worked around this by rewriting the method so that instead of ORing together all of the $conditions coming from the referencing fields in a single query we make a separate query for each condition and combine those results. This all happens in a second or two, compared to the original 60s+.

I suspect that the number of media we have makes things get out of hand with all the joins and is more manageable (and maybe the query optimizer is better) with smaller queries in our case.

Query ``` SELECT base_table.vid AS vid, base_table.mid AS mid FROM media base_table LEFT JOIN media__field_captions media__field_captions ON media__field_captions.entity_id = base_table.mid LEFT JOIN media__field_fits_file media__field_fits_file ON media__field_fits_file.entity_id = base_table.mid LEFT JOIN media__field_media_audio_file media__field_media_audio_file ON media__field_media_audio_file.entity_id = base_table.mid LEFT JOIN media__field_media_document media__field_media_document ON media__field_media_document.entity_id = base_table.mid LEFT JOIN media__field_media_file media__field_media_file ON media__field_media_file.entity_id = base_table.mid LEFT JOIN media__field_media_image media__field_media_image ON media__field_media_image.entity_id = base_table.mid LEFT JOIN media__field_media_video_file media__field_media_video_file ON media__field_media_video_file.entity_id = base_table.mid LEFT JOIN media__field_service_file media__field_service_file ON media__field_service_file.entity_id = base_table.mid LEFT JOIN media__field_track media__field_track ON media__field_track.entity_id = base_table.mid LEFT OUTER JOIN group_relationship_field_data gcfd ON base_table.mid=gcfd.entity_id AND gcfd.plugin_id IN ('group_media:audio', 'group_media:document', 'group_media:extracted_text', 'group_media:file', 'group_media:fits_technical_metadata', 'group_media:image', 'group_media:video') INNER JOIN media_field_data media_field_data ON base_table.mid=media_field_data.mid LEFT OUTER JOIN group_relationship_field_data gcfd_2 ON gcfd.gid=gcfd_2.gid AND gcfd_2.plugin_id='group_membership' AND gcfd_2.entity_id='5' WHERE ((media__field_captions.field_captions_target_id = '981051') or (media__field_fits_file.field_fits_file_target_id = '981051') or (media__field_media_audio_file.field_media_audio_file_target_id = '981051') or (media__field_media_document.field_media_document_target_id = '981051') or (media__field_media_file.field_media_file_target_id = '981051') or (media__field_media_image.field_media_image_target_id = '981051') or (media__field_media_video_file.field_media_video_file_target_id = '981051') or (media__field_service_file.field_service_file_target_id = '981051') or (media__field_track.field_track_target_id = '981051')) AND ((gcfd.entity_id IS NULL) OR ((media_field_data.status = '0') AND (((gcfd.type IN ('bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_9b20217818d0f', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_5f103496bc6d6', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_bbda3c0506a8e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_7f6ed65c8beec', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_79f4033ae3129', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_456c3eefedecd', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_071e750e136ed', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NULL)) OR ((gcfd.type IN ('group_content_type_9b20217818d0f', 'bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'group_content_type_5f103496bc6d6', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_bbda3c0506a8e', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'group_content_type_7f6ed65c8beec', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_79f4033ae3129', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'group_content_type_456c3eefedecd', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'group_content_type_071e750e136ed', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NOT NULL)))) OR ((media_field_data.status = '1') AND (((gcfd.type IN ('bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_9b20217818d0f', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_5f103496bc6d6', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_bbda3c0506a8e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_7f6ed65c8beec', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_79f4033ae3129', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_456c3eefedecd', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_071e750e136ed', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NULL)) OR ((gcfd.type IN ('group_content_type_9b20217818d0f', 'bmc_hc-group_media-audio', 'bmc_sc-group_media-audio', 'bryn_mawr-group_media-audio', 'group_content_type_d248b8926df6a', 'group_content_type_42eafe3955699', 'haverford-group_media-audio', 'hc_sc-group_media-audio', 'limited_access-group_media-audio', 'metadata_only-group_media-audio', 'swarthmore-group_media-audio', 'group_content_type_5f103496bc6d6', 'bmc_hc-group_media-document', 'bmc_sc-group_media-document', 'bryn_mawr-group_media-document', 'group_content_type_7fb8a81e35518', 'group_content_type_c4afd599d306e', 'haverford-group_media-document', 'hc_sc-group_media-document', 'group_content_type_5adc5be5f2133', 'group_content_type_01e554065688e', 'swarthmore-group_media-document', 'group_content_type_bbda3c0506a8e', 'group_content_type_be0de227a0680', 'group_content_type_e069ce90a0d47', 'group_content_type_af0c4becf176e', 'group_content_type_6c0d11e446517', 'group_content_type_7314b6e7ec3da', 'group_content_type_5b506b28d96d7', 'hc_sc-group_media-extracted_text', 'group_content_type_6f5d265994d11', 'group_content_type_8c72d65046714', 'group_content_type_84f2807dc558e', 'group_content_type_7f6ed65c8beec', 'bmc_hc-group_media-file', 'bmc_sc-group_media-file', 'bryn_mawr-group_media-file', 'group_content_type_679b086e86340', 'group_content_type_554d93c79ee77', 'haverford-group_media-file', 'hc_sc-group_media-file', 'limited_access-group_media-file', 'metadata_only-group_media-file', 'swarthmore-group_media-file', 'group_content_type_79f4033ae3129', 'group_content_type_b135a0f21123d', 'group_content_type_495ca639e40e0', 'group_content_type_0a08aed9bbcc6', 'group_content_type_23f452284b3fe', 'group_content_type_c80890dabb943', 'group_content_type_c2ad3f3c4be85', 'group_content_type_2b44a048bd750', 'group_content_type_5939495135cc6', 'group_content_type_c65dc71545e5b', 'group_content_type_c3c79100d67ca', 'group_content_type_456c3eefedecd', 'bmc_hc-group_media-image', 'bmc_sc-group_media-image', 'bryn_mawr-group_media-image', 'group_content_type_0da723448b271', 'group_content_type_eb4dd6b4ee70b', 'haverford-group_media-image', 'hc_sc-group_media-image', 'limited_access-group_media-image', 'metadata_only-group_media-image', 'swarthmore-group_media-image', 'group_content_type_071e750e136ed', 'bmc_hc-group_media-video', 'bmc_sc-group_media-video', 'bryn_mawr-group_media-video', 'group_content_type_a66256e5dcfe7', 'group_content_type_3b4cf0d1e4f46', 'haverford-group_media-video', 'hc_sc-group_media-video', 'limited_access-group_media-video', 'metadata_only-group_media-video', 'swarthmore-group_media-video')) AND (gcfd_2.entity_id IS NOT NULL))))); ```
adam-vessey commented 1 month ago

Based on the query there, it looks like you're using the group module. There is at least one big performance issue there that I know of that can pop up in larger sites: https://www.drupal.org/project/group/issues/3421756

That said, in our instance, instead of relating every media type to the group, we instead implemented/use https://github.com/discoverygarden/islandora_hierarchical_access to allow the nodes' relation to the group to affect their related media and files.

hading commented 1 month ago

I don't think that the group module is the culprit here. When I was looking at this I tested the query above with all the group stuff removed from it and the performance was still quite bad.

jordandukart commented 1 month ago

Was discussed on the October 9th, 2024 tech call.

Looks like that utils function is being inefficient and generic with how it's attempting to take a fid and resolve any media references.

You mentioned regenerating service files, how is this being triggered? At scale, as you've noted, re-writing and optimization becomes a bigger issue. May be easier to be more explicit about how that media is resolved for the triggering functions as opposed to relying on IslandoraUtils.