Open ftomassetti opened 3 months ago
Also, in my cases there are no annotations, so the field annotations
is always {}
I also wonder if we should use a database id for nodes, to avoid using the node_id field, which may be a long string (at least in my case. My nodes ID range from 11 to 538 characters, with a mean of 206)
I also wonder if we should use a database id for nodes, to avoid using the node_id field, which may be a long string (at least in my case. My nodes ID range from 11 to 538 characters, with a mean of 206)
I implemented this in some of my C# optimizations: During serialization, I want to detect duplicates. I store SHA1 hashes of the id as a fixed array of 20 bytes. It saves quite some memory (and also speeds up comparison).
Edit: See here: https://github.com/LionWeb-io/lionweb-csharp/blob/niko/serializer-handler/src/LionWeb-CSharp/core/M1/CompressedId.cs
I think there are two ways of optimizing this, both can be applied independently:
a) We can handle default values differently in our logic, so they never reach the DB. b) I could imagine the DB has a way to "compress" column values internally, maybe this needs "column-mode" tables. I don't know any specifics on PQ, just that such features exist in general.
After loading a big project (2.5M nodes) I looked at the table and I noticed some values being common.
In the reference table the value
{}
(no values references) is present in 4475306 out of 5207880 records (86%). In the containment table the value{}
(no values references) is present in 359166 out of 4514084 records (22%). In the property table the valuefalse
is present in 538840 out of 2490897 records (8%).Complessively, among these 3 tables, 5373312 records out of 12212861 (44%) have the default value. Is there a way to use this to our advantage? Can we avoid adding records when the default value is present? Can we otherwise speed up insertion?