Open thejosephstevens opened 2 years ago
There is a related Prometheus issue to support index file size bigger than 64Gi
However, I think we shouldn't wait for that, and should have Cortex skip compaction for blocks with large index.
After PR #4707 I still need to implement the part that auto skip compaction for blocks with humongous index.
Hmm looks like I can't just use Thanos largeTotalIndexSizeFilter
for Cortex's ShuffleShardingPlanner
because ShuffleShardingPlanner
is coupled with non-exported tsdbBasedPlanner
struct.
Another relevant Thanos issue https://github.com/thanos-io/thanos/issues/3068 to track the sharding work
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.
For what it's worth, we migrated to Mimir and no longer have issues with this. I think this still makes sense to implement in Cortex since it's a pretty brutal issue to hit (and effectively puts a hard cap on max tenant size), but it doesn't need to be kept open for us. If you'd like to close it for clean-up, feel free.
@thejosephstevens we are looking on how to address this in Cortex.
@alvinlin123 I don't think we can close this issue since only the proposal gets merged. Please reopen it.
@yeya24 you are right; closing this issue was a mistake. Thanks!
I still have to learn to pay attention when merging PR and not which issue may get incorrectly closed :)
Describe the bug Over the weekend we significantly expanded one of our clusters, pushing ~153M timeseries to our Cortex 1.11.1 cluster in a day.
To Reproduce Steps to reproduce the behavior:
The two blocks referenced by this error are 12-hour blocks at level-3 compaction, each of which has an index of ~38 GB (summing together to ~76 GB > 64 GB).
Expected behavior There should be a way of skipping this, forcing compaction, sharding, or something.
There's an upstream Thanos patch here which allows skipping compaction but it appears to not be used by Cortex today (found by @alvinlin123 in Cortex Slack). There's also a Thanos change here which would automatically skip past compaction if the block is too large.
Mimir appears to have the ability to get past this by sharding during compaction so multiple blocks are produced per day, each of which can have a smaller index.
Environment:
Storage Engine
Additional Context