LibrariesHacked / openlibrary-search

Searching Open Library by keywords to return ISBNs
147 stars 26 forks source link

I am experiencing slow query performance in my PostgreSQL database #14

Open ERJavier opened 11 months ago

ERJavier commented 11 months ago

Even after optimizing my Python code and database configuration. Queries are taking over 2 minutes and 30 seconds to execute.

I am running the following SQL queries:

Environment Details

I have already tried the following:

I expected both queries to execute much faster, ideally within a few seconds.

Thank you 🙏

DaveBathnes commented 7 months ago

Hi

Thanks for raising this issue! Apologies for the extremely late reply, I've been slowly working on a refinement to the loading process, and it's been exceptionally long in getting time, plus waiting for the database to build and then try again, etc!

You're right that the database isn't optimised. Originally it was designed to take an existing identifier (mainly ISBN), and good text searching hasn't been done.

Trying your own steps, I get:

SELECT data FROM authors WHERE data ->> 'name' ILIKE 'Graham Greene'

~ 15 seconds

On the second example you don't need to query the JSON data within the works table because the author key has been linked as an indexed identifier through the author_works table. For example, using a key that I obtained above:

select *
from works w
join author_works aw
on w.key = aw.work_key
where author_key = '/authors/OL6856462A'

~0.3 seconds

If I can get full text searching on some key fields (like author name), the abive steps should all be able to be done in one single query. I'll create a branch for that under this issue