dat-ecosystem-archive / book

Documentation on how to implement the Dat Protocol [ DEPRECATED - see https://github.com/hypercore-protocol/new-website/tree/master/guides for similar functionality. More info on active projects and modules at https://dat-ecosystem.org/ ]
https://datprotocol.github.io/book/
Apache License 2.0
68 stars 9 forks source link

Clarification request: bitfield deserialisation (to tree form) logic #22

Open aral opened 5 years ago

aral commented 5 years ago

Update 2: I just made a quick animation that we could add to the book or use in presentations to illustrate the flattening of the tree:

flat-tree

flat-tree.key.zip

(Original Keynote file attached under Creative Commons Attribution-ShareAlike 4.0 International. Please feel free to use in your own presentations if you want to.)


Update 1: OK, having now seen the diagram at the top of the Storage section, I now get how the tree is serialised. It would make sense to have that diagram in the Bitfield chapter to make it clear how the serialised version is arrived at.

So my deeper tree:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00 

Would flatten (now that I can see it visually it makes perfect sense, as we are literally squashing the tree down from the top node) to:

110100011111110100000000000000

I’ll prepare a PR to add a note about how the flattening happens to the Bitfield chapter.

Read on only if you want to see my late-night ramblings from last night and an alternative way to serialise a tree :)


A 🔦 question regarding bitfield serialisation logic:

In the Bitfield chapter, we have the following tree structure (top-down)

       01
  01       00
01  11   00  00

That is serialised as:

01 01 11 01 00 00 00

The logic I could work out for that was:

Start at the top (01) then take left (01), left (01), right (11). Having exhausted the left arm of the tree from the root, then proceed (from root) to (I’m assuming, following the previous pattern), right (00), left (00), and right (00) until you’ve exhausted the right arm of the tree.

While I can see how, given the tree structure, you can serialise it as such. I don’t understand how, given the serialised form, you can recreate the tree without having a delimeter for where the left arm of the tree ends.

Is it because the bitfield is always guaranteed to be balanced that you know to delimit after the (n-1)/2 th position?

What if there was another level to the tree?

Would:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00 

Serialise as:

01 01 11 01 00 11 11 11 00 00 00 00 00 00 00

In which case, dividing at (n-1)/2 would give:

01 | 01 11 01 00 11 11 11 | 00 00 00 00 00 00 00 |

And again, recursively:

01 | 01 ( 11 01 00 ) ( 11 11 11) | 00 ( 00 00 00 ) ( 00 00 00 ) |

And, finally:

01 | 01 ( 11 [01 00] ) ( 11 [11 11] ) | 00 ( 00 [00 00] ) ( 00 [00 00] ) |

Which, dropped into tree form, becomes:

            01
      01          00
  11     11    00    00
01  00 11 11 00 00  00 00

Which differs from the original… telling me that my original assumption of the serialisation logic was incorrect. The deserialisation logic makes sense, though, so, to backtrack, the original tree:

            01
      01          00
  01     11    00    00
11  00 11 11 00 00  00 00

Should have been serialised as:

01 | 01 ( 01 [11 00] 11 [11 11] ) | 00 (00 [ 00 00 ] 00 [ 00 00] ) |

Right, that makes sense. And it also makes sense in that the moment we read the 6th bit-pair, we know that we don’t have to look any further down that branch.

I think I just rubber-ducked my own question.

If the conclusion is sound, I think it would benefit the chapter for me to write it up in a concise few paragraphs. Thoughts?

Related: #1

aral commented 5 years ago

(See updates in original issue.)

Hmm… although, if my final technique is correct, then the example in the chapter:

       01
  01       00
01  11   00  00

Should serialise as:

01 01 01 11 00 00 00

Instead of what is in the chapter, which is:

01 01 11 01 00 00 00

(The 3rd and 4th positions are flipped.)

If the example in the chapter is correct, then I don’t understand the logic behind why we go left-to-right while serialising the second level but flip to right-to-left at the third level.

yoshuawuyts commented 5 years ago

@aral that diagram is fantastic! -- would you like to make a PR to add it to the book?

aral commented 5 years ago

Sure :) (And thanks!) Will aim to do that this week.

yoshuawuyts commented 5 years ago

Yay, awesome!

aral commented 5 years ago

Hey @yoshuawuyts, just a quick update to let you know I haven’t forgotten about this. Away without my Mac for the holidays so will get to it after I’m back. (Created the animation in Keynote and need to export it again as I noticed a hairline border on two edges from the Screenflow capture.)

yoshuawuyts commented 5 years ago

@aral thanks heaps for the update! -- happy holidays! :sparkles: