LibrariesHacked / openlibrary-search

Searching Open Library by keywords to return ISBNs
156 stars 28 forks source link

Useful code for checking progress during indexing #8

Open Chloe-Meinshausen opened 1 year ago

Chloe-Meinshausen commented 1 year ago

Hi all, just wanted to share this very helpful query that will check the status on the create index process. -I have one thats been running for almost 24 hours and seeing the progress creep up has been a stress relief.

This is the results you get if you run it in pgadmin.

image
      SELECT
        now()::TIME(0),
        a.query,
        p.phase,
        round(p.blocks_done / p.blocks_total::numeric * 100, 2) AS "% done",
        p.blocks_total,
        p.blocks_done,
        p.tuples_total,
        p.tuples_done,
        ai.schemaname,
        ai.relname,
        ai.indexrelname
      FROM pg_stat_progress_create_index p
      JOIN pg_stat_activity a ON p.pid = a.pid
      LEFT JOIN pg_stat_all_indexes ai on ai.relid = p.relid AND ai.indexrelid = p.index_relid;

source: https://dba.stackexchange.com/questions/11329/monitoring-progress-of-index-construction-in-postgresql/249784#249784?newreg=7a3d794dcb154b6782ca390ead375050

DaveBathnes commented 1 year ago

Thank you! That's really useful. Those indexes on the json column I know are very slow to build. And probably not very efficient!

The whole thing took about 2 days for me last time I ran it. The indexes were a long time, and then the final full vacuum command, as that rebuilds the data. That also may be unnecessary. Because there's a lot of processing of the data, I thought it would be a useful way of reducing the final database size. I'll need to double check again how effective it is though.

One next stage to the project I will start having a look at is more tailored indexing. Some of the data would be useful in a full-text search, and I think that would mean pulling out the relevant data fields (e.g. book description, title) into a tsvector column. And then indexing only specific fields in the json.

Also, thanks for the PR! Apologies for the delay, it's been a busy week so I've been a bit delayed getting a chance to look at it. But I'm looking forward to seeing the tests run!