Open nelsonic opened 6 years ago
null = 1 bit variable/feature/column = 8 bits in ASCII_land or 8 to 48 bits in UTF-8land per character_
NoSQL with a pure implementation takes up way more space.
Let's say for example you have this sparse dataset:
[ key1 int, key2 text, key3 int, key4 text, key5 int ]
Each column name would be repeated within the document storage if the value is present.
In best case scenario there is only one value per document:
{ key1: 23 }
because the column name is 4 characters long it is (4 * 8) = 32 bits to specify the column name. That means you could have 32 empty columns in Postgres before the row would equal the same bits to specify which key the value belongs to.
In the worst case scenario you would specify all columns:
{ key1: 23, key2: "Bobby drop_tables)eliteHaxx", key3: 42, key4: "1040z", key5: 199 }
In this scenario (4 5) = 20 characters are used to spell out the column names. That's 20 8 bits! You could have 160 null columns in a row before the size would equal the unoptimized NoSQL implementation.
But once you reach 800 or so columns (up to 1,600 simple columns) in Postgres then you will run into the max block size of 8KB. So it's better to store sparse data as JSONB data type.
Disclaimer: I'm not sure how variables are stored and processed in an actual implementation but I'm assuming MongoDB must not actually store raw JSON. If so then my previous statement has less weight. Postgres and other relational OLTP systems can often store JSON too (I think JSONB is a lot better than storing many sparse columns). I'm pretty sure the actual implementation is optimized fairly well
Many people chose to use a non-relational ("NoSQL") database (such as MongoDB or Cassandra) from the start of their project/app because they want the "flexibility" of not having a "rigid" schema. From experience we have observed that the vast majority of Apps can actually hugely benefit from developers thinking about schema up-front rather than "dumping data into mongo" and dealing with out to query and relate it later.
One of the common questions when using PostgreSQL is around having "empty" fields in a table. Imagine that you have a schema for a
person
(a human being that uses your SaaS product). That person might have a wide range of data associated with them. see: https://github.com/dwyl/fields/issues/4 Let's consider a typical record:null
null
alex
As you can see from this basic "person" record, Alex does not have an instagram or facebook account. Both columns in the table will be
null
:inserted_at
name
email (PK)
facebook
mobile
instagram
twitter
website
Question: Does having
null
values take up a lot of Disk Space?Answer: No, Each
null
value only uses onebit
on disk.For argument's sake, let's represent the data this way "on disk":
So in the above example, the Alex's row of data in the
people
table is 62 bytes or 496 bits:In the scheme of things, having a few
null
columns will not have a major impact on the disk usage. If your database table has 30 suchdefault null
fields, it would still only be 30 bits per row. In general, we will only add a fields to a table/schema that has a high likelihood of containing data. We aren't going to have a field formyspace
...Obviously it would be even better if
null
data took up zero bits, but sadly that is not realistic. Because the underlying data storage needs a way of "representing" thenull
value.Conclusion: having structured relational data queries is "worth it" for a few bits of
null
data.I feel that the "trade off" of having structured relational data (when appropriate) is worthwhile for the sane querying and having a few bits of
null
data in the rows, is a "fair price".Relevant Stack Overflow / Databases Q/A