WordPress / openverse-api

The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
https://api.openverse.engineering/v1
MIT License
76 stars 50 forks source link

Reduce DB queries needed in search results #1040

Closed dhruvkb closed 1 year ago

dhruvkb commented 1 year ago

Fixes

Fixes #1036 by @sarayourfriend

Description

Currently a search result needs several DB queries to populate the page.

  1. Get Hit instances from ES
  2. Get DB rows corresponding to the hits.
  3. For each audio row, search for AudioSet and MatureAudio instances. For each image search for the MatureImage instances.

This PR

This is an example query for audio, it includes the related matureaudio and audioset data too!

SELECT 
  "audio"."id", 
  "audio"."created_on", 
  "audio"."updated_on", 
  "audio"."identifier", 
  "audio"."foreign_identifier", 
  "audio"."title", 
  "audio"."foreign_landing_url", 
  "audio"."creator", 
  "audio"."creator_url", 
  "audio"."thumbnail", 
  "audio"."provider", 
  "audio"."url", 
  "audio"."filesize", 
  "audio"."filetype", 
  "audio"."watermarked", 
  "audio"."license", 
  "audio"."license_version", 
  "audio"."source", 
  "audio"."last_synced_with_source", 
  "audio"."removed_from_source", 
  "audio"."view_count", 
  "audio"."tags", 
  "audio"."tags_list", 
  "audio"."category", 
  "audio"."meta_data", 
  "audio"."bit_rate", 
  "audio"."sample_rate", 
  "audio"."audio_set_foreign_identifier", 
  "audio"."audio_set_position", 
  "audio"."genres", 
  "audio"."duration", 
  "audio"."alt_files", 
  "audioset"."id", 
  "audioset"."created_on", 
  "audioset"."updated_on", 
  "audioset"."foreign_identifier", 
  "audioset"."title", 
  "audioset"."foreign_landing_url", 
  "audioset"."creator", 
  "audioset"."creator_url", 
  "audioset"."thumbnail", 
  "audioset"."provider", 
  "audioset"."url", 
  "audioset"."filesize", 
  "audioset"."filetype", 
  "api_matureaudio"."updated_on", 
  "api_matureaudio"."created_on", 
  "api_matureaudio"."media_obj_id" 
FROM 
  "audio" 
  LEFT OUTER JOIN "audioset" ON (
    "audio"."audio_set_foreign_identifier" = "audioset"."foreign_identifier" 
    AND "audio"."provider" = "audioset"."provider"
  ) 
  LEFT OUTER JOIN "api_matureaudio" ON (
    "audio"."identifier" = "api_matureaudio"."media_obj_id"
  ) 
WHERE 
  "audio"."identifier" = 'f97384f6-a5ba-4668-881f-f616904651f0' :: uuid 
LIMIT 
  21;

Testing Instructions

Screenshot 2022-12-12 at 12 47 41 PM
  1. Enable logging for Django DB queries as explained here.
  2. You can also define a formatter with some text like "[db]" for filtering the output using grep.
  3. Start logging the output: just dc logs -f web | grep -E '\[db]'.
  4. Visit the search results and single result pages of audio and image media types.
  5. You should see a single query for each page view in the logs.

Checklist

Developer Certificate of Origin

Developer Certificate of Origin ``` Developer Certificate of Origin Version 1.1 Copyright (C) 2004, 2006 The Linux Foundation and its contributors. 1 Letterman Drive Suite D4700 San Francisco, CA, 94129 Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. ```
github-actions[bot] commented 1 year ago

API Developer Docs Preview: Ready

https://wordpress.github.io/openverse-api/_preview/1040

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

dhruvkb commented 1 year ago

Wait, I might have figured it out.

krysal commented 1 year ago

As much as I'd like to, I don't think I'll get to review this in the week, so tagging other folks in case they can/want to.

dhruvkb commented 1 year ago

The migration 0052 is largely a no-op with only 2 actual SQL statements! This can be verified with the sqlmigrate command.

$ just dc exec web python manage.py sqlmigrate api 0052
BEGIN;
--
-- Add field audioset to audio
-- (no-op)
--
-- Alter field identifier on audioreport
CREATE INDEX "nsfw_reports_audio_identifier_ebe3a079" ON "nsfw_reports_audio" ("identifier");
--
-- Alter field identifier on deletedaudio
-- (no-op)
--
-- Alter field identifier on deletedimage
-- (no-op)
--
-- Alter field identifier on imagereport
CREATE INDEX "nsfw_reports_identifier_f0374e03" ON "nsfw_reports" ("identifier");
--
-- Alter field identifier on matureaudio
-- (no-op)
--
-- Alter field identifier on matureimage
-- (no-op)
--
-- Rename field identifier on audioreport to media_obj
-- (no-op)
--
-- Rename field identifier on deletedaudio to media_obj
-- (no-op)
--
-- Rename field identifier on deletedimage to media_obj
-- (no-op)
--
-- Rename field identifier on imagereport to media_obj
-- (no-op)
--
-- Rename field identifier on matureaudio to media_obj
-- (no-op)
--
-- Rename field identifier on matureimage to media_obj
-- (no-op)
COMMIT;
github-actions[bot] commented 1 year ago

This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced.