dataegret / pgcompacttable

BSD 3-Clause "New" or "Revised" License
314 stars 48 forks source link

Multiple runs for big tables #43

Open wenq1 opened 2 years ago

wenq1 commented 2 years ago

For large tables (~30GB), is it possible to run pgcompacttable till say 10%, do a manual VACUUM and REINDEX, and then run pgcompacttable again which will just do the remaining 90% of the work?

dk98126 commented 1 year ago

@Melkij Hello! Could you please answer this question?

What happens if I stop the script? Does it go all over again and processes every single page or does it somehow knows last processed page and starts from this page?

In other words, how can I run script multiple times? In my production environment I can't just leave it running because it would run for 5 days.

alexius2 commented 1 year ago

Hello, if you stop pgcompacttable and run it again - it will start from the beginning: run vacuum, check bloat and then will try to move data from tail pages if there is enough bloat left. During vacuum phase it's possible that it will remove tail pages that was cleaned up in previous run so it won't have to process them again. Also pgcompacttable runs vacuum periodically after certain amount of processed pages.

If other transactions prevent vacuum from acquiring lock and remove pages - pgcompacttable will process these pages again, but it will be faster because they should already be empty.

So in some cases - yes, it's ok to run/stop it multiple times to process big table, eventually it should finish. If table is not so big but pgcompacttable runs slowly - perhaps there are many bloated indexes which better be compacted before data movement with --initial-reindex option. Also there is --delay-ratio option which could speed up process (if there is enough resources).

pgcompacttable won't clean up for itself if stopped and will leave it's function in database (shouldn't affect anything), which may be dropped using this query:

select 'drop function ' || proname || '(text, text, integer, integer, integer);' from pg_proc where proname ~'pgcompact_clean_pages';

(it will return query/queries to run for removing pgcompacttable functions).

dk98126 commented 1 year ago

@alexius2 Thanks for the detailed answer!

If table is not so big but pgcompacttable runs slowly - perhaps there are many bloated indexes which better be compacted before data movement with --initial-reindex option.

Based on your experience, is adding --initial-reindex option also good not just for small, but big tables? Our production tables is 190GB and it's biggest index is about 70GB (total indexes size is 483GB)

I'm interested in how this option will affect performance. Now I know that pgcompacttable doesn't really affect our database performance. I'm just a little scared to turn --initial-reindex option...

Thank you in advance!