Doctorbal / zabbix-postgres-partitioning

Zabbix PostgreSQL version 11 Native Partitioning
MIT License
43 stars 15 forks source link

BRIN index #14

Closed adrianlzt closed 3 years ago

adrianlzt commented 4 years ago

If I have understand correctly who they works, they could be useful for indexing the clock column (correlation between that column and the block where the data is inserted), but for itemid that correlation does not exists.

We can expect having in each range all the itemids. When Zabbix wants to get some values for a particular itemid, the BRIN index will just return something like "read all the blocks".

I have created a BRIN index to one of my trends table and for the itemid column, each block has almost the same range of values:

zabbix-server=# SELECT blknum,attnum,value FROM brin_page_items(get_raw_page('partitions.trends_uint_2020_05_brin_itemid_clock', 14), 'partitions.trends_uint_2020_05_brin_itemid_clock') where attnum = 1 limit 10;
 blknum | attnum |         value
--------+--------+-----------------------
 318208 |      1 | {25782 .. 75190048}
 318336 |      1 | {25375 .. 75190981}
 318464 |      1 | {25460 .. 75229977}
 318592 |      1 | {25498 .. 75229962}
 318720 |      1 | {25511 .. 75229963}
 318848 |      1 | {25462 .. 75177392}
 318976 |      1 | {1966526 .. 75192034}
 319104 |      1 | {26534 .. 75181617}
 319232 |      1 | {25466 .. 75153121}
 319360 |      1 | {1137467 .. 75153122}
Doctorbal commented 4 years ago

Hello @adrianlzt ,

Thanks for bringing this up. I think BRIN indexes are great when space is a concern. But other than that they are slower than B-tree indexes and don't handle ranges as well as B-tree indexes do.

I noticed you didn't have any question posted with you opening up the issue. Do you have a question in mind?

Best Regards, Andreas

adrianlzt commented 4 years ago

And does BRIN index provide an improvent over not having an index?

To be honest I just opened this issue to discusss a little if having a BRIN index is a good idea to reduce storage on old partitions.

Doctorbal commented 4 years ago

Any index will provide an improvement over not having an index. I would still go with default B-tree indexes when it comes to ranges. Otherwise BRIN would be cool to experiment with. I personally haven't played with it in production instances.

Right on for the conversation on this topic! I am also not too familiar with BRIN indexes but your mindset of reducing storage on old partitions sounds like a good idea. How big is your DB that you are looking to use this type of index?

adrianlzt commented 3 years ago

Good :)

We have ~2TB database only for Zabbix with SSD disks.

History tables is partitioned daily, keeping 7 days and each one takes ~50GB (33 table + 17 index). history and history_uint are the big ones, the others are < 2GB.

Trends are partitioned each month and we keep a little more than one year. Each trend table is ~35GB (25 table + 10 index).

The bloat ratio (queries) for tables is around 1%, with fillfactor = 100.

For indexes, bloat in each of the history partitions rises up to 40%, but we have a job to, after two days, change fillfactor to 100 (btree default fillfactor is 90) and us pg_repack to reduce bloat (to less than 1%).

Another idea, not implemented, is to use tablespaces to have a warm disk and move old partitions to that disk (pg_repack could also handle that).

I was thinking to change the indexes of old partitions from btree to brin, but as itemids are not sequential, I think postgres will end up making a seq scan of the whole partition.

Mmm, maybe if the query is just for a time range smaller than the trend partition, brin could reduce the cost of the query, but we are also testing make history partitions each hour and trends each day, so in that case I think brin will be useless

Doctorbal commented 3 years ago

Hello @adrianlzt ,

Apologies for the late reply. I have not checked on this for quite some time as I have shifted to other projects.

I believe your suggestion of using BRIN indexes for old partitions, using tablespaces and minimizing bloat with pg_repack are excellent ideas. I just haven't tried it myself.

For your DB size this makes perfect sense. I have a 200GB Zabbix DB with 7 day history table partitions and 1 year trend partitions. Quite small compared to yours :-)!

BRIN indexes work really well for time series data, is fast and cheap. I would say go for it.

zvalcav commented 3 years ago

Hi,

I'm trying to resolve similar problems. I've started to test BRIN, which is awesome for clock columns in trends and history, but not so good for itemid. Some specific selects with itemid = condition tend to be much slower with brin than btree even when overall performance is at least comparable and sometimes much better for brin. I've found this awesome article which describes indexing problem for Zabbix in deep details. https://geeks-world.imtqy.com/articles/468463/index.html Combination of brin(clock) and btree_gin(itemid) sounds great for me and I'll definitely go that way. I'll definitely adapt @Doctorbal 's scripts for this kind of indexing.

Doctorbal commented 3 years ago

@zvalcav, thank you for providing your input in this thread. I truly haven't tested BRIN. I would be interested in seeing how you adopted the scripts and made it work.