IntelLabs / pmgd

Persistent Memory Graph Database
MIT License
43 stars 9 forks source link

Any node and edge core size size estimations? #28

Closed Outstep closed 5 years ago

Outstep commented 5 years ago

Hello,

Things are progressing well with my project using PMGD and I am extremely happy with what I have seen in this project although a little more feedback from the developers would be helpful at times.

One question that I am wondering is about the size estimations.

I would like to try and find out how much space is used (approximately) for each node and each edge that is added. I know that there is additional overhead when adding properties to a node or edge as well, but I am now having some ideas on how to speed up my searches when the whole database grows very large.

Right now, it seems that PMGD is using AVL-Trees in the core structure for node management which is good since they balance well, but I think that this idea could be greatly extended to the realm of properties on both edges and nodes. Just a thought though.

Have a good weekend :)

philiplantz commented 5 years ago

The size of each node is 64 bytes and the size of each edge is 32 bytes. This space can contain a few properties and when the space is full then additional space is allocated to contain properties in 64 byte increments. Each node also has an attached data structure containing its edges, and I'm not sure what the size of that is.

Property values are stored in a very compact way. Property values take 3 bytes plus the size of the property. Integer properties are stored in the minimum number of bytes required to hold the value. Booleans take 1 byte. Time values take 9 bytes. Floats take 8 bytes. Strings up to 13 bytes occupy the exact length of the string. Strings longer than 13 bytes occupy 12 bytes in the node and space equal to the length of the string allocated separately from the node.

Indices probably take the most space. Storage space used by indices hasn't been as carefully optimized as the space used by nodes, edges, and properties. I don't know how much is used. You can use the get_index_stats methods in Graph to find out how much space is being used for indices.