cockroachdb / docs

CockroachDB user documentation
https://cockroachlabs.com/docs
Creative Commons Attribution 4.0 International
188 stars 456 forks source link

Document product limits #1830

Closed jseldess closed 2 years ago

jseldess commented 7 years ago

Jesse Seldess (jseldess) commented:

We need a recommendation against letting a single row get close to 64MB (the default size at which we split a range). In addition to the current row data, all historical versions of the row that have not been garbage collected count toward the overall size.

Normally, a range contains many rows, in which case when the range gets to the limit, we split into 2 ranges. If a single row takes up all of a range, however, we won't split the range but rather let it get larger than the max limit.

Jira Issue: DOC-125

jseldess commented 5 years ago

@robert-s-lee, how important do you think this is at our phase?

robert-s-lee commented 5 years ago

This comes up often in conversation with app developers as things not to do.

jseldess commented 5 years ago

See https://github.com/cockroachdb/docs/issues/380 for answers from 2016.

bdarnell commented 5 years ago

If a single row takes up all of a range, however, we won't split the range but rather let it get larger than the max limit.

Note that this is no longer true. We now block writes to the offending range when it gets too large. It's still important to stay out of this situation, but it will no longer destabilize the rest of the cluster.

jseldess commented 4 years ago

Another issue asking or physical maximum limits for column names, table names, row width: https://github.com/cockroachdb/docs/issues/7280

And a comment from @ajwerner there:

With regards to names, in light of cockroachdb/cockroach#48443, we should document that while there is currently no limit, 63 characters is a good guideline and may be enforced in future versions. Longer names will not fundamentally cause a problems especially if the names remain shorter than kilobytes.

jseldess commented 4 years ago

Another suggestion that we need this, from @a-entin:

Question re "System Limits" doc page. I don't think we have one (pls point if missed) and wonder what are the thoughts on that, maybe there is already a plan for it?

The idea is 2 have 1 place to check all kinds of limits a user can bump into. Most databases have that, supper helpful. 3 diff database examples: https://docs.oracle.com/cd/B28359_01/server.111/b28320/limits.htm https://docs.oracle.com/cd/E18283_01/timesten.112/e17114/limit.htm https://docs.memsql.com/v7.1/reference/configuration-reference/system-limits/

Also, there's this very old PR started by @knz that could help.

@johnrk, @taroface, I think it's time for us to prioritize this. Although there are still no actual, hard limits, we should be able to provide guideposts based on our own internal testing and what we know from customer usage. Perhaps telemetry would help here. I think part of the work is defining what dimension to even document limits for. I imagine @a-entin can help.

jseldess commented 4 years ago

More from @a-entin:

I would change "production limits" to more directly "product" or "system" limits... production alludes to capacity planning aspect and we need product characteristics documented as first order, imho. I'm pretty sure we have some hard limits, but even if "no limit", it's really helpful to say this explicitly on essential parameters where limits are expected by users. It will eliminate a lot of support chatter. We can build the list of parameters to list. The 3 examples are pretty good / common sense and you'll see a lot of correlation

taroface commented 4 years ago

@chudro let me know what topics we should add here! Thanks.

chudro commented 4 years ago

@taroface The SE's are in the midst of capturing the topics that should be listed in a product limits page. Here is a link to a google spreadsheet providing that list as it is being captured. There is a separate tab providing a list of example db websites and how they provide similar lists: https://docs.google.com/spreadsheets/d/1JWqggZ1tZ_wDI_lrvWdEBtge4hQFvXEZYczQtMhkRjI/edit?usp=sharing

florence-crl commented 3 years ago

@taroface this Cockroach Cloud user requires this type of information: https://cockroachdb.zendesk.com/agent/tickets/6546

I am looking answers for below questions: 1) How much maximum columns are supported by a CRDB table? 2) How much maximum size of individual data can be stored in CRDB cell? 3) How much maximum size a table can store? 4) Maximum number of tables in one database? 5) Maximum number of databases per CRDB cluster?

Your answer will help me to architect my applications.

https://forum.cockroachlabs.com/t/cockroachdb-limits/592/7

SantoshSah commented 3 years ago

Hi @florence-crl , any update on above questions?

taroface commented 3 years ago

Also to cover:

Number of tables, columns, indexes, user-defined schemas, and databases we support Row size limits (especially JSON sizes), range size limits Cluster storage limits in cluster Number of nodes in cluster Number of connections support per host

taroface commented 3 years ago

Verbiage from @jhatcher9999: https://cockroachlabs.slack.com/archives/CHVV403F0/p1610553560090500?thread_ts=1610550065.087100&cid=CHVV403F0

taroface commented 3 years ago

I believe the 64MB range limit is changing for 21.1 (or has changed already) - needs to be verified.

bdarnell commented 3 years ago

The default range size limit changed from 64MB to 512MB in 20.1.

jseldess commented 3 years ago

@mikeCRL, thanks for taking this on! This is an old issue, so it's helpful to articulate next steps:

  1. Identify the most important dimensions to document. This SE spreadsheet is a good starting point. In my opinion, we don't have to cover everything. I'd want to focus on all things data-related (as opposed to number of cores per node, etc.), as the data limits are going to be useful for all customers, both self-hosted and CC. @piyush-singh, can you support Mike in this step?
  2. For each dimension, identify the practical "limits". Channeling @bdarnell, we may not want to talk about these as limits, per se, but rather numbers that are known to work. For this, I would poll the SE and CS teams about the biggest environments we know of in the field. Eventually, Engineering may be able to do lots of testing to establish actual limits, but that is a big effort that's not in scope right now.
mikeCRL commented 3 years ago

Update: I've worked in that limits sheet, adding a priority column, some analyses of competitor docs, proposals for docs, and questions. SEs have jumped in with some responses. I have also mentioned this work in the context of a CC thread on gathering large cluster data.

@a-entin can I ask you, too, to please give the sheet and specifically the Questions tab a review? If you have any leads on any of the highlighted high-priority data rows, or anything else to contribute, I'd appreciate it.

mikeCRL commented 3 years ago

Notes from my visit to bdarnell's Chief Architect Office Hours.

a-entin commented 3 years ago

I made a few comments in the spreadsheet and also added a tab "JDBC Metadata Dump". JDBC allows the client to query database defaults and limits. I think we should at least be aware, perhaps true to what we report about ourselves. For example, we report getMaxConnections | 8192 getMaxColumnsInTable | 1600 getMaxRowSize | 1073741824

We also suppose to answer these: (currently no answer i.e. jdbc support is incomplete) getMaxSchemaNameLength getMaxTableNameLength getMaxCursorNameLength getMaxCharLiteralLength getMaxColumnsInGroupBy getMaxUserNameLength getMaxProcedureNameLength getMaxBinaryLiteralLength getMaxColumnsInIndex getMaxCatalogNameLength getMaxColumnNameLength

Also noticed we may not be entirely honest about getDefaultTransactionIsolation | TRANSACTION_READ_COMMITTED yet this is a different topic

mikeCRL commented 3 years ago

It's become apparent that there is interest in documenting two separate types of things: hard limits (a well-defined list of system-specified limits, including clarity on what we do not limit) and best practices/considerations related to size and scale.

I clarified the split in our spreadsheet - which should remain SSOT for the foreseeable future - and am splitting this work into multiple issues:

CockroachDB CockroachCloud
Hard limits #1830 #10032
Size/scale BPs #10031 #10037

I have raised an initial draft PR for CRDB https://github.com/cockroachdb/docs/pull/10035. In this case, it touches on both of the above - limits and BPs - because there was a lot of info relevant to both gathered in this issue and the sheet. I found that there was a reasonable place for a lot of this existing info on the schema design page, given the focus on various DB object types.

Future efforts to gather information can occur via the more specific issues.

jseldess commented 2 years ago

We have closed this issue because it is more than 3 years old. If this issue is still relevant, please add a comment and reopen the issue. Thank you for your contribution to CockroachDB docs!