Add sizing guideline for ballast files

florence-crl commented 4 years ago

Florence Morris (florence-crl) commented:

from @knz there's a formula that's easier to understand than to explain. The idea is to combine two things. 1) how fast their data grows over time. To know this they should use metrics/monitoring and plot their storage growth over days/weeks/months. They also need to understand their storage spikes (e..g Bulk I/O events and the necessary disk space for them) 2) how fast they are able to react to a "low storage" condition, e.g by adding nodes or more disk space. Some businesses can react within 1 day, others need 2 weeks to work on it.

Once they know these two things, they need to choose a ballast that covers the amount of disk space growing (1) during their reaction period (2).

Examples:

They generate 1GB per week, and they need 2 weeks turnaround to grow their disk space, they need 2GB ballast.
They generate only 100MB per week, but they perform a bulk i/o event that needs 2GB every day, and they can only react to disk shortage within 2 days, then they probably need 2-3GB ballasts.

One layer of complexity is that the intermediate state of the growth can appear larger than the long-term state, because of RocksDB compactions. For example if they create a lot of data quickly, there will be more disk usage than what they have put in their SQL, until RocksDB compacts it.

Another layer is MVCC: if they delete data, the data is still around until it is GC'ed (zone config, default 25 hours). So if their workload is delete-heavy they need to consider that.

Both things can be reliably ignored if their disk usage evolves slowly (which is common) and they can monitor it at a high level (e.g. our capacity metric in the UI, or if they do their own export using prometheus)

from @jseldess An addition is that we need to strongly recommend that they put alerts in place to notify them of “low storage” conditions so they can set their process in place. For example, when a node is running low on disk space and using prometheus metrics. Ideally, a customer shouldn’t get to the point where they need to use a ballast file.

cc: @Annebirzin @piyush-singh since this ties into observability and alerting

Jira Issue: DOC-453

RoachietheSupportRoach commented 4 years ago

Zendesk ticket #4842 has been linked to this issue.

jseldess commented 4 years ago

These needs also came up at the recent Education Offsite.

jseldess commented 3 years ago

Now that we have automatic ballast files on node startup, do we need detailed guidance here still? @mwang1026, thoughts? Users can still set the ballast-size, so maybe we do?

mwang1026 commented 2 years ago

I don't think so? We have a default size that we can document (I believe it's something like 1GB or 1% of disk) (But we should check before documenting that exactly :D )

cockroachdb / docs

Add sizing guideline for ballast files #6754