Closed ppKrauss closed 6 years ago
Now, with v0.2+ it is changed. Becnhmarking same 15 datasets, but now as "tab-aoa" JTD,
id | urn | pkey | jtd | n_cols | n_rows |
---|---|---|---|---|---|
1 | (2)lexml:autoridade | id | tab-aoa | 9 | 601 |
4 | (2)lexml:evento | id | tab-aoa | 9 | 14 |
... | ... (see more at Wiki) ... | ... | ... | ... | ... |
Disk-usage:
nspname | n_tables | total_bytes | table_bytes | table_size |
---|---|---|---|---|
dataset | 4 | 1474560 | 73728 | 72 kB |
Total disk-usage reduction, from 6080 kB to 72 kB, 8440%! (6080=72*84.4). No disk cost for datasets!
(this issue was for v0.1.0, that change from v0.2+, so, use benchmarking of v0.1 as reference)
The benchmark, see Wiki, shows that
the dataset expansion by the jsonb_array_elements() function is so faster tham per-row JSONs.
Only need faster when there are a lot of rows (more tham ~9000)... When need really fast SELECT, use best (~2 to 5 times, not so much) is a table or MATERIALIZED VIEW; and faster is a indexed one.
By other hand, there are some demand to add into
dataset.big
"any other free JSON" dataset.Conclusion: the best is "1 dataset per row" at
dataset.bit
, a free-JSONb dataset with "some JSON-schema" controled bydataset.meta
. We denominated it as "JSON TYPE DEFINITION", JTD, to be a set of labels indication standard structures. Only some JTDs are tabular: other need a rule to join/split into rows... The best is to stay all in "1 dataset per row".Reference Benchmark of v0.1.0
15 datasets, load as "many rows per dataset". In terms of disk-usage, all the dataset-schema sums table_size 6080 kB. If all datasets of
dataset.big
was translated to usual SQL-tables, the custo will be less (!), ~75% or less.Comparing JSON and JSONb
A test with less rows (before "datasets:world_cities"), but good results, for
dataset.big
total usage:So no expressive advantage to use a JSON, with only ~2% of gain. Better stay with JSONb, a "first-class citizen".
Disk-usage reduction when all rows in one JSONb-array
Checking table_usage with "datasets-br:br_city_codes":
Conclusion: "all rows in one" is ~10 times better!
Checking performance
No loss of performance by PostgreSQL's EXPLAIN ANALYZE: as showed at Wiki, the function jsonb_array_elements() is very fast.